ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miller, Timothy" <Timothy.Mil...@childrens.harvard.edu>
Subject Re: markable types
Date Tue, 20 May 2014 21:55:15 GMT

On 05/20/2014 07:24 AM, Anirban Chakraborti wrote:
> Here it is
>
> 1. The Ctakes typesystem represents syntax trees with three types:
> TopTreebankNode, TreebankNode, and TerminalTreebankNode - Understood.
>
> 2. The parser works at the sentence level, so a standard thing is to
> simultaneously get all trees/sentences by doing:
> for(TopTreebankNode tree : JCasUtil.select(jcas, TopTreebankNode.class)) -
> Understood
>
> My question is that a single word in a sentence may belong to various
> types simultaneously. How does the associated typeclass get stored in the
> nodes of tree so that when we parse the tree/sentence , we can get select
> type of interest and associated features/attributes
>
> what I want to understand what is the keys/value pairs of each node.
>
> Basically so that the following code works
>
> List<DiseaseDisorderMention> disease = new
>> ArrayList<>(JCasUtil.select(jcas, DiseaseDisorderMention.class);  //
> DiseaseDisorderMention is the selected typeclass to be extracted
>
Ok, I think I understand a bit. The way all UIMA annotations work is
they just represent spans on a string, so yes, there can be multiple
annotations for a given word. In fact, it maybe is a little misleading
to even use "word" there since UIMA has no sense of words, just strings
with character offsets.

So the above code should work to get DiseaseDisorderMention types. And
there is really no relation to the parse tree in the way the processing
works. So the extractor runs and creates a bunch of
DiseaseDisorderMention spans, then the parser runs and creates a bunch
of TreebankNode spans, and never the twain shall meet unless you supply
some code to bring them together.

So if you're looking for an easy way to navigate a parse tree and find
the named entities in it or vice versa, it's possible to do but it's not
automatic. You would probably want to start with the utility classes I
pointed you to earlier, possibly make your own modifications, and maybe
you would even need to create your own derived types to represent what
you're interested in.

Tim



>
> Hope I am clearer this time
>
>  Anir
>
>
>
>
> On Tue, May 20, 2014 at 4:32 PM, Miller, Timothy <
> Timothy.Miller@childrens.harvard.edu> wrote:
>
>> I don't understand this question. Can you try to rephrase it? Or maybe if
>> you tell me what you want to do that would help me understand.
>>
>> ________________________________________
>> From: Anirban Chakraborti [chakraborti.anirban@googlemail.com]
>> Sent: Tuesday, May 20, 2014 6:34 AM
>> To: dev@ctakes.apache.org
>> Subject: Re: markable types
>>
>> thanks again Timothy
>>
>> final question for now
>>
>> You had explained that each sentence is parsed and is converted to a
>>> tree with head and terminal node . Is the typesystem of ctakes an feature
>>> of the node, i.e can one node belong to two more typesystems and their
>>> further attributes OR for each type system , there is a syntax tree for
>>> every sentence parsed. I mean a sentence has various trees attached to it
>>> but there is 1:1 mapping between the node and typesystem.
>> Anir
>>
>>
>> On Tue, May 20, 2014 at 2:17 AM, Miller, Timothy <
>> Timothy.Miller@childrens.harvard.edu> wrote:
>>
>>> On 05/18/2014 07:40 AM, Anirban Chakraborti wrote:
>>>> Timothy,
>>>>
>>>> 1. so to get concepts of procedure, lab (if any), disease disorder ,
>> sign
>>>> symptoms, Anatomical sites , I would need to do
>>>>
>>>> List<MedicationMention> meds = new ArrayList<>(JCasUtil.select(jcas,
>>>> MedicationMention.class) ;
>>>> List<DiseaseDisorderMention> disease = new
>>>> ArrayList<>(JCasUtil.select(jcas, DiseaseDisorderMention.class);
>>>> List<SignSymptomsMention> signs = new ArrayList<>(JCasUtil.select(jcas,
>>>> SignSymptomMention.class);
>>>> List<AnatomicalMention> anatomy = new ArrayList
>>>> <> (JacsUtil.select(jcas,AnatomicalMention.class);
>>>> List <LabMention> labs = new ArrayList <>
>>>> (JacsUtil.select(jcas,LabMention.class);
>>>>
>>>> then check the size of the array { meds, disease, signs, anatomy ,
>> labs}
>>> ,
>>>> print out the array or make a new array using the Java.utils.List or
>>>> Java.utils.Arraylist  package interfaces as the case might me.  Right
>> ...
>>> yep
>>>> 2. I am more interested in the IdentifiedAnnotation class. However
>> there
>>>> are concepts like FractionAnnotation which are not defined enum in the
>>>> const.java. How do I handle them. Do I need to add to the const.java
>>> file.
>>> nope, you probably just want EntityMention (for anatomical sites) and
>>> EventMention (for all clinical events, including DiseaseDisorder,
>>> Procedure, SignSymptom, etc.).
>>>
>>>> 3. what exactly is the functional difference between say
>>>> MedicationEventMention .java, MedicationMention.java, Medication.java
>> and
>>>> MedicationEventMention_type.java .  I understand similar difference is
>>>> between class of lab, procedure etc...
>>> The types ending in _type.java are UIMA-internal types, you can ignore.
>>> Medication is a referential type -- something in the real world that
>>> could be referred to multiple times in a document. What you probably
>>> want are the mention types. Here I believe MedicationMention is the
>>> preferred type going forward for a particular mention of a medication in
>>> text (MedicationEventMention is the same thing but not preferred going
>>> forward).
>>>
>>>
>>>> 4.  You had explained that each sentence is parsed and is converted to
>> a
>>>> tree with head and terminal node . Is the typesystem of ctakes an
>> feature
>>>> of the node, i.e can one node belong to two more typesystems and their
>>>> further attributes OR for each type system , there is a syntax tree for
>>>> every sentence parsed. I mean a sentence has various trees attached to
>> it
>>>> but there is 1:1 mapping between the node and typesystem.
>>>>
>>>> Many Thanks
>>>>
>>>> Anirban
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, May 15, 2014 at 5:03 PM, Miller, Timothy <
>>>> Timothy.Miller@childrens.harvard.edu> wrote:
>>>>
>>>>> Anir -- I'm not sure I understand your question but from your example
>> it
>>>>> doesn't sound like a tree exactly. If you just want a list of
>> medication
>>>>> concepts you can do something like:
>>>>>
>>>>> List<MedicationMention> meds = new ArrayList<>(JCasUtil.select(jcas,
>>>>> MedicationMention.class));
>>>>> (I believe MedicationMention is the correct class but check your
>>> output.)
>>>>> If you really do want to put them into a syntax tree, there are also
>>>>> methods for doing that in AnnotationTreeUtils class.
>>>>>
>>>>> getAnnotationTree(JCas, Annotation) will give you the tree for the
>> whole
>>>>> sentence containing the annotation you give it
>>>>> annotationNode(JCas, Annotation) will give you the smallest subtree
>> tree
>>>>> covering the annotation you give it.
>>>>> insertAnnotationNode(JCas, TopTreebankNode, Annotation, String) will
>>>>> insert a node into the tree specified at the level specified by the
>>>>> annotation with the category specified by the string. So for example
>> if
>>> you
>>>>> had meds as above you could then do:
>>>>>
>>>>> for(MedicationMention med : meds){
>>>>>   AnnotationTreeUtils.insertAnnotationNode(jcas,
>>>>> AnnotationTreeUtils.getAnnotationTree(jcas, med), med, "MEDICATION")
>>>>> }
>>>>>
>>>>> which would insert a new node into every tree with the label
>>> "MEDICATION"
>>>>> in every position where a medication was found.
>>>>>
>>>>> One caveat to the above code is that these methods actually will
>> change
>>>>> the tree in the cas. That might be ok for some use cases but for many
>>> you
>>>>> want to work on a tree outside the cas so that's why there is also
>>> methods:
>>>>> getTreeCopy(JCas, TopTreebankNode)
>>>>> getTreeCopy(JCas, TreebankNode)
>>>>>
>>>>> if you use the getAnnotationTree method to obtain the tree you want,
>>> then
>>>>> you can get a copy from these methods, then use the insert methods and
>>> do
>>>>> something with them immediately (like print them out), without
>> altering
>>> the
>>>>> originals in the cas if other AEs may use them.
>>>>>
>>>>> Tim
>>>>>
>>>>>
>>>>>
>>>>> ________________________________________
>>>>> From: Anirban Chakraborti [chakraborti.anirban@googlemail.com]
>>>>> Sent: Sunday, May 11, 2014 9:15 AM
>>>>> To: dev@ctakes.apache.org
>>>>> Subject: Re: markable types
>>>>>
>>>>> Steven,
>>>>>
>>>>> Would you have any example code of tree parser so the output can be
>>>>> arranged as per need. I mean, after successful annotation, I want to
>>>>> extract certain concepts like medication only and arrange them in a
>> new
>>>>> tree so that all annotation in reference to medication concept and
>> their
>>>>> sources are listed together.
>>>>>
>>>>> Anir
>>>>>
>>>>>
>>>>> On Sun, May 11, 2014 at 3:55 PM, Steven Bethard <
>>> steven.bethard@gmail.com
>>>>>> wrote:
>>>>>> I don't think "not something anyone would want extracted" should
be
>> an
>>>>>> argument against anything. We already have constituent and dependency
>>>>>> parse trees in the type system, and those would fall under that
>>>>>> category.
>>>>>>
>>>>>> So +1 on markables in the type system. (In general, +1 on moving
>>>>>> module-specific types to the standard type system. I'm not sure what
>>>>>> the real benefit of splitting them out is...)
>>>>>>
>>>>>> Steve
>>>>>>
>>>>>> On Fri, May 9, 2014 at 11:53 AM, Miller, Timothy
>>>>>> <Timothy.Miller@childrens.harvard.edu> wrote:
>>>>>>> What do people think about taking the "markable" types out of
the
>>>>>>> coreference project and adding them to the standard type system?
>> This
>>>>> is
>>>>>>> a pretty standard concept in coreference that doesn't really
have a
>>>>>>> great natural representation in the current type system -- it
>>>>>>> encompasses IdentifiedAnnotations as well as pronouns ("It",
"him",
>>>>>>> "her") and some determiners ("this").
>>>>>>>
>>>>>>> The drawback I can see is that it is probably not something anyone
>>>>> would
>>>>>>> want extracted -- ultimately you want the actual coref pairs
or
>>> chains.
>>>>>>> But it is useful for things like representing gold standard input
or
>>>>>>> splitting coreference resolution into separate markable recognition
>>> and
>>>>>>> relation classification steps.
>>>>>>>
>>>>>>> Tim
>>>>>>>
>>> --
>>> Tim Miller
>>> Instructor
>>> Boston Children's Hospital and Harvard Medical School
>>> timothy.miller@childrens.harvard.edu
>>> 617-919-1223
>>>
>>>

-- 
Tim Miller
Instructor
Boston Children's Hospital and Harvard Medical School
timothy.miller@childrens.harvard.edu
617-919-1223


Mime
View raw message