Return-Path: X-Original-To: apmail-ctakes-dev-archive@www.apache.org Delivered-To: apmail-ctakes-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B0B6F11A20 for ; Tue, 20 May 2014 11:03:03 +0000 (UTC) Received: (qmail 53608 invoked by uid 500); 20 May 2014 11:03:03 -0000 Delivered-To: apmail-ctakes-dev-archive@ctakes.apache.org Received: (qmail 53560 invoked by uid 500); 20 May 2014 11:03:03 -0000 Mailing-List: contact dev-help@ctakes.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ctakes.apache.org Delivered-To: mailing list dev@ctakes.apache.org Received: (qmail 53552 invoked by uid 99); 20 May 2014 11:03:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 May 2014 11:03:03 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: error (nike.apache.org: local policy) Received: from [134.174.13.92] (HELO mailsmtp2.childrenshospital.org) (134.174.13.92) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 May 2014 11:03:00 +0000 Received: from pps.filterd (mailsmtp2.childrenshospital.org [127.0.0.1]) by mailsmtp2.childrenshospital.org (8.14.5/8.14.5) with SMTP id s4KAwZSY017006 for ; Tue, 20 May 2014 07:02:16 -0400 Received: from smtpndc2.chboston.org (smtpndc2.chboston.org [10.20.50.105]) by mailsmtp2.childrenshospital.org with ESMTP id 1kyq8gu6vu-1 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NOT) for ; Tue, 20 May 2014 07:02:15 -0400 Received: from pps.filterd (smtpndc2.chboston.org [127.0.0.1]) by smtpndc2.chboston.org (8.14.5/8.14.5) with SMTP id s4KB04Mk008084 for ; Tue, 20 May 2014 07:02:15 -0400 Received: from chexhubcas1.chboston.org (internal-ndc-nat-v1260.tch.harvard.edu [10.20.50.4]) by smtpndc2.chboston.org with ESMTP id 1k4r520nr1-1 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT) for ; Tue, 20 May 2014 07:02:15 -0400 Received: from CHEXMBX3A.CHBOSTON.ORG ([fe80::8df1:9966:b0b0:841d]) by CHEXHUBCAS1.CHBOSTON.ORG ([::1]) with mapi id 14.03.0169.001; Tue, 20 May 2014 07:02:15 -0400 From: "Miller, Timothy" To: "dev@ctakes.apache.org" Subject: RE: markable types Thread-Topic: markable types Thread-Index: Ac9rnuNpQCrSpMJgQqmoegO29SQyJAImZXwA///Ex64= Date: Tue, 20 May 2014 11:02:14 +0000 Message-ID: References: , In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.18.21.55] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.11.96,1.0.14,0.0.0000 definitions=2014-05-20_02:2014-05-20,2014-05-20,1970-01-01 signatures=0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.11.96,1.0.14,0.0.0000 definitions=2014-05-20_02:2014-05-20,2014-05-20,1970-01-01 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=7.0.1-1402240000 definitions=main-1405200127 X-Virus-Checked: Checked by ClamAV on apache.org I don't understand this question. Can you try to rephrase it? Or maybe if y= ou tell me what you want to do that would help me understand.=0A= =0A= ________________________________________=0A= From: Anirban Chakraborti [chakraborti.anirban@googlemail.com]=0A= Sent: Tuesday, May 20, 2014 6:34 AM=0A= To: dev@ctakes.apache.org=0A= Subject: Re: markable types=0A= =0A= thanks again Timothy=0A= =0A= final question for now=0A= =0A= You had explained that each sentence is parsed and is converted to a=0A= > tree with head and terminal node . Is the typesystem of ctakes an feature= =0A= > of the node, i.e can one node belong to two more typesystems and their=0A= > further attributes OR for each type system , there is a syntax tree for= =0A= > every sentence parsed. I mean a sentence has various trees attached to it= =0A= > but there is 1:1 mapping between the node and typesystem.=0A= =0A= Anir=0A= =0A= =0A= On Tue, May 20, 2014 at 2:17 AM, Miller, Timothy <=0A= Timothy.Miller@childrens.harvard.edu> wrote:=0A= =0A= >=0A= > On 05/18/2014 07:40 AM, Anirban Chakraborti wrote:=0A= > > Timothy,=0A= > >=0A= > > 1. so to get concepts of procedure, lab (if any), disease disorder , si= gn=0A= > > symptoms, Anatomical sites , I would need to do=0A= > >=0A= > > List meds =3D new ArrayList<>(JCasUtil.select(jcas,= =0A= > > MedicationMention.class) ;=0A= > > List disease =3D new=0A= > > ArrayList<>(JCasUtil.select(jcas, DiseaseDisorderMention.class);=0A= > > List signs =3D new ArrayList<>(JCasUtil.select(jca= s,=0A= > > SignSymptomMention.class);=0A= > > List anatomy =3D new ArrayList=0A= > > <> (JacsUtil.select(jcas,AnatomicalMention.class);=0A= > > List labs =3D new ArrayList <>=0A= > > (JacsUtil.select(jcas,LabMention.class);=0A= > >=0A= > > then check the size of the array { meds, disease, signs, anatomy , labs= }=0A= > ,=0A= > > print out the array or make a new array using the Java.utils.List or=0A= > > Java.utils.Arraylist package interfaces as the case might me. Right .= ..=0A= > yep=0A= > > 2. I am more interested in the IdentifiedAnnotation class. However ther= e=0A= > > are concepts like FractionAnnotation which are not defined enum in the= =0A= > > const.java. How do I handle them. Do I need to add to the const.java=0A= > file.=0A= > nope, you probably just want EntityMention (for anatomical sites) and=0A= > EventMention (for all clinical events, including DiseaseDisorder,=0A= > Procedure, SignSymptom, etc.).=0A= >=0A= > >=0A= > > 3. what exactly is the functional difference between say=0A= > > MedicationEventMention .java, MedicationMention.java, Medication.java a= nd=0A= > > MedicationEventMention_type.java . I understand similar difference is= =0A= > > between class of lab, procedure etc...=0A= > The types ending in _type.java are UIMA-internal types, you can ignore.= =0A= > Medication is a referential type -- something in the real world that=0A= > could be referred to multiple times in a document. What you probably=0A= > want are the mention types. Here I believe MedicationMention is the=0A= > preferred type going forward for a particular mention of a medication in= =0A= > text (MedicationEventMention is the same thing but not preferred going=0A= > forward).=0A= >=0A= >=0A= > >=0A= > > 4. You had explained that each sentence is parsed and is converted to = a=0A= > > tree with head and terminal node . Is the typesystem of ctakes an featu= re=0A= > > of the node, i.e can one node belong to two more typesystems and their= =0A= > > further attributes OR for each type system , there is a syntax tree for= =0A= > > every sentence parsed. I mean a sentence has various trees attached to = it=0A= > > but there is 1:1 mapping between the node and typesystem.=0A= > >=0A= > > Many Thanks=0A= > >=0A= > > Anirban=0A= > >=0A= > >=0A= > >=0A= > >=0A= > >=0A= > > On Thu, May 15, 2014 at 5:03 PM, Miller, Timothy <=0A= > > Timothy.Miller@childrens.harvard.edu> wrote:=0A= > >=0A= > >> Anir -- I'm not sure I understand your question but from your example = it=0A= > >> doesn't sound like a tree exactly. If you just want a list of medicati= on=0A= > >> concepts you can do something like:=0A= > >>=0A= > >> List meds =3D new ArrayList<>(JCasUtil.select(jcas,= =0A= > >> MedicationMention.class));=0A= > >> (I believe MedicationMention is the correct class but check your=0A= > output.)=0A= > >>=0A= > >> If you really do want to put them into a syntax tree, there are also= =0A= > >> methods for doing that in AnnotationTreeUtils class.=0A= > >>=0A= > >> getAnnotationTree(JCas, Annotation) will give you the tree for the who= le=0A= > >> sentence containing the annotation you give it=0A= > >> annotationNode(JCas, Annotation) will give you the smallest subtree tr= ee=0A= > >> covering the annotation you give it.=0A= > >> insertAnnotationNode(JCas, TopTreebankNode, Annotation, String) will= =0A= > >> insert a node into the tree specified at the level specified by the=0A= > >> annotation with the category specified by the string. So for example i= f=0A= > you=0A= > >> had meds as above you could then do:=0A= > >>=0A= > >> for(MedicationMention med : meds){=0A= > >> AnnotationTreeUtils.insertAnnotationNode(jcas,=0A= > >> AnnotationTreeUtils.getAnnotationTree(jcas, med), med, "MEDICATION")= =0A= > >> }=0A= > >>=0A= > >> which would insert a new node into every tree with the label=0A= > "MEDICATION"=0A= > >> in every position where a medication was found.=0A= > >>=0A= > >> One caveat to the above code is that these methods actually will chang= e=0A= > >> the tree in the cas. That might be ok for some use cases but for many= =0A= > you=0A= > >> want to work on a tree outside the cas so that's why there is also=0A= > methods:=0A= > >> getTreeCopy(JCas, TopTreebankNode)=0A= > >> getTreeCopy(JCas, TreebankNode)=0A= > >>=0A= > >> if you use the getAnnotationTree method to obtain the tree you want,= =0A= > then=0A= > >> you can get a copy from these methods, then use the insert methods and= =0A= > do=0A= > >> something with them immediately (like print them out), without alterin= g=0A= > the=0A= > >> originals in the cas if other AEs may use them.=0A= > >>=0A= > >> Tim=0A= > >>=0A= > >>=0A= > >>=0A= > >> ________________________________________=0A= > >> From: Anirban Chakraborti [chakraborti.anirban@googlemail.com]=0A= > >> Sent: Sunday, May 11, 2014 9:15 AM=0A= > >> To: dev@ctakes.apache.org=0A= > >> Subject: Re: markable types=0A= > >>=0A= > >> Steven,=0A= > >>=0A= > >> Would you have any example code of tree parser so the output can be=0A= > >> arranged as per need. I mean, after successful annotation, I want to= =0A= > >> extract certain concepts like medication only and arrange them in a ne= w=0A= > >> tree so that all annotation in reference to medication concept and the= ir=0A= > >> sources are listed together.=0A= > >>=0A= > >> Anir=0A= > >>=0A= > >>=0A= > >> On Sun, May 11, 2014 at 3:55 PM, Steven Bethard <=0A= > steven.bethard@gmail.com=0A= > >>> wrote:=0A= > >>> I don't think "not something anyone would want extracted" should be a= n=0A= > >>> argument against anything. We already have constituent and dependency= =0A= > >>> parse trees in the type system, and those would fall under that=0A= > >>> category.=0A= > >>>=0A= > >>> So +1 on markables in the type system. (In general, +1 on moving=0A= > >>> module-specific types to the standard type system. I'm not sure what= =0A= > >>> the real benefit of splitting them out is...)=0A= > >>>=0A= > >>> Steve=0A= > >>>=0A= > >>> On Fri, May 9, 2014 at 11:53 AM, Miller, Timothy=0A= > >>> wrote:=0A= > >>>> What do people think about taking the "markable" types out of the=0A= > >>>> coreference project and adding them to the standard type system? Thi= s=0A= > >> is=0A= > >>>> a pretty standard concept in coreference that doesn't really have a= =0A= > >>>> great natural representation in the current type system -- it=0A= > >>>> encompasses IdentifiedAnnotations as well as pronouns ("It", "him",= =0A= > >>>> "her") and some determiners ("this").=0A= > >>>>=0A= > >>>> The drawback I can see is that it is probably not something anyone= =0A= > >> would=0A= > >>>> want extracted -- ultimately you want the actual coref pairs or=0A= > chains.=0A= > >>>> But it is useful for things like representing gold standard input or= =0A= > >>>> splitting coreference resolution into separate markable recognition= =0A= > and=0A= > >>>> relation classification steps.=0A= > >>>>=0A= > >>>> Tim=0A= > >>>>=0A= >=0A= > --=0A= > Tim Miller=0A= > Instructor=0A= > Boston Children's Hospital and Harvard Medical School=0A= > timothy.miller@childrens.harvard.edu=0A= > 617-919-1223=0A= >=0A= >=0A=