Return-Path: X-Original-To: apmail-incubator-ctakes-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-ctakes-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F3AA7E454 for ; Thu, 6 Dec 2012 19:21:32 +0000 (UTC) Received: (qmail 48294 invoked by uid 500); 6 Dec 2012 19:21:32 -0000 Delivered-To: apmail-incubator-ctakes-dev-archive@incubator.apache.org Received: (qmail 48263 invoked by uid 500); 6 Dec 2012 19:21:32 -0000 Mailing-List: contact ctakes-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: ctakes-dev@incubator.apache.org Delivered-To: mailing list ctakes-dev@incubator.apache.org Received: (qmail 48236 invoked by uid 99); 6 Dec 2012 19:21:32 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Dec 2012 19:21:32 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: error (athena.apache.org: local policy) Received: from [134.174.20.73] (HELO mailsmtp3.childrenshospital.org) (134.174.20.73) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Dec 2012 19:21:25 +0000 Received: from pps.filterd (mailsmtp3 [127.0.0.1]) by mailsmtp3.childrenshospital.org (8.14.5/8.14.5) with SMTP id qB6JKWC3021328 for ; Thu, 6 Dec 2012 14:20:44 -0500 Received: from smtpndc2.chboston.org (smtpndc2.chboston.org [10.20.50.105]) by mailsmtp3.childrenshospital.org with ESMTP id 192kgbtfas-1 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NOT) for ; Thu, 06 Dec 2012 14:20:44 -0500 Received: from pps.filterd (smtpndc2 [127.0.0.1]) by smtpndc2.chboston.org (8.14.5/8.14.5) with SMTP id qB6JJ9Kw003114 for ; Thu, 6 Dec 2012 14:20:44 -0500 Received: from chexhubcas3.chboston.org (internal-ndc-nat-v1260.tch.harvard.edu [10.20.50.4]) by smtpndc2.chboston.org with ESMTP id 18h58nkggv-1 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT) for ; Thu, 06 Dec 2012 14:20:44 -0500 Received: from CHEXMBX1A.CHBOSTON.ORG ([fe80::3c05:8ca9:55a6:f320]) by CHEXHUBCAS3.CHBOSTON.ORG ([::1]) with mapi id 14.02.0309.002; Thu, 6 Dec 2012 14:20:43 -0500 From: "Chen, Pei" To: "ctakes-dev@incubator.apache.org" Subject: RE: type system changes needed to read SHARP data Thread-Topic: type system changes needed to read SHARP data Thread-Index: Ac3FdXQqsrhRp+ICTRqbbBGZUoaOoQGnyXqKAA/nSwABv3fvgAAk3LjA Date: Thu, 6 Dec 2012 19:20:42 +0000 Message-ID: <924DE05C19409B438EB81DE683A942D9241D87@CHEXMBX1A.CHBOSTON.ORG> References: <817C9376-FD35-4383-AFA7-A507EF9FD1C4@colorado.edu> <5E12F7FD7D58D54DA5542ECC8B14519801F6F8@MSGPEXCHA28B.mfad.mfroot.org> In-Reply-To: <5E12F7FD7D58D54DA5542ECC8B14519801F6F8@MSGPEXCHA28B.mfad.mfroot.org> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.7.2.44] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.9.8185,1.0.431,0.0.0000 definitions=2012-12-06_06:2012-12-06,2012-12-06,1970-01-01 signatures=0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.9.8185,1.0.431,0.0.0000 definitions=2012-12-06_06:2012-12-06,2012-12-06,1970-01-01 signatures=0 X-Virus-Checked: Checked by ClamAV on apache.org Hi Steven, +1 it seems reasonable. Just taking a step back, should there always be a 1-1 mapping between huma= n annotated data (Knowtator schema) and the System annotated data (cTAKES t= ype system)? If this is true, then should they really share the schema then? i.e. Can t= he annotation tool(s) be auto generated/based off the type system schema or= vice versa then? Just thinking of ways we may save time with mappings... --Pei > -----Original Message----- > From: Wu, Stephen T., Ph.D. [mailto:Wu.Stephen@mayo.edu] > Sent: Wednesday, December 05, 2012 3:37 PM > To: ctakes-dev@incubator.apache.org > Subject: Re: type system changes needed to read SHARP data >=20 > Sorry for the delayed response, Steve. The type system was not designed = to > house the annotations, but rather the later results of processing. It ma= kes > sense to do both. >=20 > Takeaways, first, then point-by-point response. > For 3.1.0 the type system should include more than just "LabMention, > ProcedureMention, SignSymptomMention, DiseaseDisorderMention, > AnatomicalSiteMention." It should also include the exhaustive list of > attributes, which would come as subtypes of Modifier. >=20 >=20 > Let me hear some +1s and we'll make it happen... >=20 > stephen >=20 >=20 > >> "Clinical_attribute" -- is this what you're looking for: > >> org.apache.ctakes.typesystem.type.refsem.Attribute > >> It inherits from Element. > > But Attribute is a TOP and we need an Annotation here. (An added > > concern is, does it really make sense to have a raw Attribute, and not > > some specific sub-type like BodyLaterality or BodySide?) > To capture the Knowtator annotations, yes, we do need an Annotation -- > namely Modifier subtypes, as you've suggested. > Attribute is not really meant to be instantiated, it is just meant to be = a super- > type that could feasibly provide easier indexing. >=20 > >> Lab should be at org.apache.ctakes.typesystem.type.refsem.Lab > > But Lab is a TOP, and we need an Annotation here. > Again, for the case of reading in Knowtator, yes. I think the addition o= f > LabMention, etc, were slated for 3.1.0, right james? >=20 > >> Use the type org.apache.ctakes.typesystem.type.textsem.Modifier with > >> the "category" feature. > > Should there be constants for each of these categories? > There are constants in > /ctakes-type- > system/src/main/java/org/apache/ctakes/typesystem/type/constant > s/CONST.java >=20 > >> "Person", --> Entity > > But Entity is a TOP, not an Annotation. > This is an interesting question. Person was not previously included in a= CEM, > so it doesn't have a semantic TOP subtype. Therefore, it also doesn't ha= ve a > Annotation subtype. For now we'll just leave it be. >=20 > >>> After working with this data I think we should consider having > >>> separate UIMA Annotation sub-types for each of the things that are > >>> Modifiers now. For example, if we have a real Severity Annotation > >>> for textual mentions of severity, then the CAS makes it easy to selec= t > these. > I think we're lining up with you on this now. >=20 > > The types we're talking about are not > > used locally within a single AnalysisEngine. They're read in from the > > SHARPKnowtatorXMLReader AnalysisEngine, and used separately... > > So they can't be local to a > > single AnalysisEngine, and they must be in the CAS. > Agreed, because of the gold standard representation issue. >=20 > > That's exactly what I'm talking about with the severity modifiers. We > > have a severity modifier extraction annotator, and we *do* need to > > evaluate its performance by comparing the severity modifiers it > > extracts to those in the annotated data... So we really do want > > everything that's in the Knowtator XML annotations to be loaded and > accessible to all our UIMA AnalysisEngines. > Ok. There is a slight difference in finding modifiers because, for the m= ost > part annotators wouldn't mark e.g., a negation term that didn't modify > anything clinically interesting. But there are enough cases where an att= ribute > should be searched for and evaluated on its own that I suppose it's worth= it > to add all these Modifier subtypes. >=20 > >> 2) Will these modifiers be reusable downstream? > > I'm not sure what you mean here. Are you suggesting that the type > > system should only have types for things that external users of cTAKES > > might need, and that we shouldn't have types for things that must be > > passed between different cTAKES AnalysisEngines? > Sorry for being unclear: "downstream" in this context meant "to other UIM= A > components in the NLP pipeline."