incubator-ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wu, Stephen T., Ph.D." <>
Subject Re: type system changes needed to read SHARP data
Date Wed, 05 Dec 2012 20:36:19 GMT
Sorry for the delayed response, Steve.  The type system was not designed to
house the annotations, but rather the later results of processing.  It makes
sense to do both.  

Takeaways, first, then point-by-point response.
For 3.1.0 the type system should include more than just "LabMention,
ProcedureMention, SignSymptomMention, DiseaseDisorderMention,
AnatomicalSiteMention."  It should also include the exhaustive list of
attributes, which would come as subtypes of Modifier.

Let me hear some +1s and we'll make it happen...


>> "Clinical_attribute" -- is this what you're looking for:
>> org.apache.ctakes.typesystem.type.refsem.Attribute
>> It inherits from Element.
> But Attribute is a TOP and we need an Annotation here. (An added concern is,
> does it really make sense to have a raw Attribute, and not some specific
> sub-type like BodyLaterality or BodySide?)
To capture the Knowtator annotations, yes, we do need an Annotation --
namely Modifier subtypes, as you've suggested.
Attribute is not really meant to be instantiated, it is just meant to be a
super-type that could feasibly provide easier indexing.

>> Lab should be at org.apache.ctakes.typesystem.type.refsem.Lab
> But Lab is a TOP, and we need an Annotation here.
Again, for the case of reading in Knowtator, yes.  I think the addition of
LabMention, etc, were slated for 3.1.0, right james?

>> Use the type org.apache.ctakes.typesystem.type.textsem.Modifier with the
>> "category" feature.
> Should there be constants for each of these categories?
There are constants in

>> "Person", --> Entity
> But Entity is a TOP, not an Annotation.
This is an interesting question.  Person was not previously included in a
CEM, so it doesn't have a semantic TOP subtype.  Therefore, it also doesn't
have a Annotation subtype.  For now we'll just leave it be.

>>> After working with this data I think we should consider having separate UIMA
>>> Annotation sub-types for each of the things that are Modifiers now. For
>>> example, if we have a real Severity Annotation for textual mentions of
>>> severity, then the CAS makes it easy to select these.
I think we're lining up with you on this now.

> The types we're talking about are not
> used locally within a single AnalysisEngine. They're read in from the
> SHARPKnowtatorXMLReader AnalysisEngine, and used separately...
> So they can't be local to a
> single AnalysisEngine, and they must be in the CAS.
Agreed, because of the gold standard representation issue.

> That's exactly what I'm talking about with the severity modifiers. We have a
> severity modifier extraction annotator, and we *do* need to evaluate its
> performance by comparing the severity modifiers it extracts to those in the
> annotated data... So we really do want everything that's in the Knowtator XML
> annotations to be loaded and accessible to all our UIMA AnalysisEngines.
Ok.  There is a slight difference in finding modifiers because, for the most
part annotators wouldn't mark e.g., a negation term that didn't modify
anything clinically interesting.  But there are enough cases where an
attribute should be searched for and evaluated on its own that I suppose
it's worth it to add all these Modifier subtypes.
>> 2) Will these modifiers be reusable downstream?
> I'm not sure what you mean here. Are you suggesting that the type system
> should only have types for things that external users of cTAKES might need,
> and that we shouldn't have types for things that must be passed between
> different cTAKES AnalysisEngines?
Sorry for being unclear: "downstream" in this context meant "to other UIMA
components in the NLP pipeline."

View raw message