incubator-ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wu, Stephen T., Ph.D." <>
Subject Re: type system changes needed to read SHARP data
Date Wed, 05 Dec 2012 20:51:44 GMT
Maybe I should've added some additional considerations around
SHARPKnowtatorXMLReader to the discussion...

The previous way to handle all these modifiers was to directly map them to
the named entities that they're associated with.  Again taking negation as
an example, we hadn't been creating a Modifier subtype for polarity, but
just set the value of a named entity as negated.

Storing all of these attributes as Modifier subtypes in
SHARPKnowtatorXMLReader does not eliminate the need to map these subtypes to
NEs.  The Knowtator data includes both the spans of modifiers AND the
assignment of values to the NEs.

So there are some modifiers that you'd never be interested in evaluating on
their own apart from the NEs.  However, I'm agreeing with the previous
proposition because there are other modifiers that are interesting to
evaluate apart from NEs, and we should just keep things consistent.


On 12/5/12 2:36 PM, "Stephen Wu" <> wrote:

> Sorry for the delayed response, Steve.  The type system was not designed to
> house the annotations, but rather the later results of processing.  It makes
> sense to do both.
> Takeaways, first, then point-by-point response.
> For 3.1.0 the type system should include more than just "LabMention,
> ProcedureMention, SignSymptomMention, DiseaseDisorderMention,
> AnatomicalSiteMention."  It should also include the exhaustive list of
> attributes, which would come as subtypes of Modifier.
> Let me hear some +1s and we'll make it happen...
> stephen
>>> "Clinical_attribute" -- is this what you're looking for:
>>> org.apache.ctakes.typesystem.type.refsem.Attribute
>>> It inherits from Element.
>> But Attribute is a TOP and we need an Annotation here. (An added concern is,
>> does it really make sense to have a raw Attribute, and not some specific
>> sub-type like BodyLaterality or BodySide?)
> To capture the Knowtator annotations, yes, we do need an Annotation --
> namely Modifier subtypes, as you've suggested.
> Attribute is not really meant to be instantiated, it is just meant to be a
> super-type that could feasibly provide easier indexing.
>>> Lab should be at org.apache.ctakes.typesystem.type.refsem.Lab
>> But Lab is a TOP, and we need an Annotation here.
> Again, for the case of reading in Knowtator, yes.  I think the addition of
> LabMention, etc, were slated for 3.1.0, right james?
>>> Use the type org.apache.ctakes.typesystem.type.textsem.Modifier with the
>>> "category" feature.
>> Should there be constants for each of these categories?
> There are constants in
> /ctakes-type-system/src/main/java/org/apache/ctakes/typesystem/type/constant
> s/
>>> "Person", --> Entity
>> But Entity is a TOP, not an Annotation.
> This is an interesting question.  Person was not previously included in a
> CEM, so it doesn't have a semantic TOP subtype.  Therefore, it also doesn't
> have a Annotation subtype.  For now we'll just leave it be.
>>>> After working with this data I think we should consider having separate
>>>> UIMA
>>>> Annotation sub-types for each of the things that are Modifiers now. For
>>>> example, if we have a real Severity Annotation for textual mentions of
>>>> severity, then the CAS makes it easy to select these.
> I think we're lining up with you on this now.
>> The types we're talking about are not
>> used locally within a single AnalysisEngine. They're read in from the
>> SHARPKnowtatorXMLReader AnalysisEngine, and used separately...
>> So they can't be local to a
>> single AnalysisEngine, and they must be in the CAS.
> Agreed, because of the gold standard representation issue.
>> That's exactly what I'm talking about with the severity modifiers. We have a
>> severity modifier extraction annotator, and we *do* need to evaluate its
>> performance by comparing the severity modifiers it extracts to those in the
>> annotated data... So we really do want everything that's in the Knowtator XML
>> annotations to be loaded and accessible to all our UIMA AnalysisEngines.
> Ok.  There is a slight difference in finding modifiers because, for the most
> part annotators wouldn't mark e.g., a negation term that didn't modify
> anything clinically interesting.  But there are enough cases where an
> attribute should be searched for and evaluated on its own that I suppose
> it's worth it to add all these Modifier subtypes.
>>> 2) Will these modifiers be reusable downstream?
>> I'm not sure what you mean here. Are you suggesting that the type system
>> should only have types for things that external users of cTAKES might need,
>> and that we shouldn't have types for things that must be passed between
>> different cTAKES AnalysisEngines?
> Sorry for being unclear: "downstream" in this context meant "to other UIMA
> components in the NLP pipeline."

View raw message