incubator-ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen, Pei" <>
Subject RE: type system changes needed to read SHARP data
Date Thu, 06 Dec 2012 19:20:42 GMT
Hi Steven,
+1 it seems reasonable.

Just taking a step back,  should there always be a 1-1 mapping between human annotated data
(Knowtator schema) and the System annotated data (cTAKES type system)?
If this is true, then should they really share the schema then?  i.e. Can the annotation tool(s)
be auto generated/based off the type system schema or vice versa then?  Just thinking of ways
we may save time with mappings...


> -----Original Message-----
> From: Wu, Stephen T., Ph.D. []
> Sent: Wednesday, December 05, 2012 3:37 PM
> To:
> Subject: Re: type system changes needed to read SHARP data
> Sorry for the delayed response, Steve.  The type system was not designed to
> house the annotations, but rather the later results of processing.  It makes
> sense to do both.
> Takeaways, first, then point-by-point response.
> For 3.1.0 the type system should include more than just "LabMention,
> ProcedureMention, SignSymptomMention, DiseaseDisorderMention,
> AnatomicalSiteMention."  It should also include the exhaustive list of
> attributes, which would come as subtypes of Modifier.
> Let me hear some +1s and we'll make it happen...
> stephen
> >> "Clinical_attribute" -- is this what you're looking for:
> >> org.apache.ctakes.typesystem.type.refsem.Attribute
> >> It inherits from Element.
> > But Attribute is a TOP and we need an Annotation here. (An added
> > concern is, does it really make sense to have a raw Attribute, and not
> > some specific sub-type like BodyLaterality or BodySide?)
> To capture the Knowtator annotations, yes, we do need an Annotation --
> namely Modifier subtypes, as you've suggested.
> Attribute is not really meant to be instantiated, it is just meant to be a super-
> type that could feasibly provide easier indexing.
> >> Lab should be at org.apache.ctakes.typesystem.type.refsem.Lab
> > But Lab is a TOP, and we need an Annotation here.
> Again, for the case of reading in Knowtator, yes.  I think the addition of
> LabMention, etc, were slated for 3.1.0, right james?
> >> Use the type org.apache.ctakes.typesystem.type.textsem.Modifier with
> >> the "category" feature.
> > Should there be constants for each of these categories?
> There are constants in
> /ctakes-type-
> system/src/main/java/org/apache/ctakes/typesystem/type/constant
> s/
> >> "Person", --> Entity
> > But Entity is a TOP, not an Annotation.
> This is an interesting question.  Person was not previously included in a CEM,
> so it doesn't have a semantic TOP subtype.  Therefore, it also doesn't have a
> Annotation subtype.  For now we'll just leave it be.
> >>> After working with this data I think we should consider having
> >>> separate UIMA Annotation sub-types for each of the things that are
> >>> Modifiers now. For example, if we have a real Severity Annotation
> >>> for textual mentions of severity, then the CAS makes it easy to select
> these.
> I think we're lining up with you on this now.
> > The types we're talking about are not
> > used locally within a single AnalysisEngine. They're read in from the
> > SHARPKnowtatorXMLReader AnalysisEngine, and used separately...
> > So they can't be local to a
> > single AnalysisEngine, and they must be in the CAS.
> Agreed, because of the gold standard representation issue.
> > That's exactly what I'm talking about with the severity modifiers. We
> > have a severity modifier extraction annotator, and we *do* need to
> > evaluate its performance by comparing the severity modifiers it
> > extracts to those in the annotated data... So we really do want
> > everything that's in the Knowtator XML annotations to be loaded and
> accessible to all our UIMA AnalysisEngines.
> Ok.  There is a slight difference in finding modifiers because, for the most
> part annotators wouldn't mark e.g., a negation term that didn't modify
> anything clinically interesting.  But there are enough cases where an attribute
> should be searched for and evaluated on its own that I suppose it's worth it
> to add all these Modifier subtypes.
> >> 2) Will these modifiers be reusable downstream?
> > I'm not sure what you mean here. Are you suggesting that the type
> > system should only have types for things that external users of cTAKES
> > might need, and that we shouldn't have types for things that must be
> > passed between different cTAKES AnalysisEngines?
> Sorry for being unclear: "downstream" in this context meant "to other UIMA
> components in the NLP pipeline."

View raw message