ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miller, Timothy" <Timothy.Mil...@childrens.harvard.edu>
Subject Re: markable types
Date Sun, 18 May 2014 00:38:15 GMT
Again I'm not sure I understand so please clarify if this isn't what you're looking for.

The Ctakes typesystem represents syntax trees with three types: TopTreebankNode, TreebankNode,
and TerminalTreebankNode. Top and Terminal inherit from TreebankNode with special properties
for being the root of a tree or the leaf of a tree (including the part of speech tag and a
word). For most trees, calling getNodeType() will get you the category you want. For Terminal
trees, getNodeType() and getNodeValue() will have the POS and word respectively. You can get
the subtrees for a node with getChildren() and a specific subtree with getChildren(int), where
the int arg is indexed from 0. Each tree is also connected to its parent by getParent(). Each
node also has its headword denoted by the getHead() method (I think that's right but I'm doing
this from memory so you'll have to check), which is an index into the array of _all_ children
in the sentence. So if tree.getHead() returns 5, then you would call getTerminals() on the
root tree and get the word at index 5 to get the head of the variable tree.
The parser works at the sentence level, so a standard thing is to simultaneously get all trees/sentences
by doing:
for(TopTreebankNode tree : JCasUtil.select(jcas, TopTreebankNode.class)){
  // do something with this tree

Hope this helps.

On May 17, 2014, at 1:54 PM, Anirban Chakraborti wrote:

> Thanks Timothy,
> I get the point but would be greatly helpful if you have an illustrative
> example of a tree structure describing the branches and the nodes generated
> by Ctakes. I have got an hang how to parse the tree now.
> On Thu, May 15, 2014 at 5:03 PM, Miller, Timothy <
> Timothy.Miller@childrens.harvard.edu> wrote:
>> Anir -- I'm not sure I understand your question but from your example it
>> doesn't sound like a tree exactly. If you just want a list of medication
>> concepts you can do something like:
>> List<MedicationMention> meds = new ArrayList<>(JCasUtil.select(jcas,
>> MedicationMention.class));
>> (I believe MedicationMention is the correct class but check your output.)
>> If you really do want to put them into a syntax tree, there are also
>> methods for doing that in AnnotationTreeUtils class.
>> getAnnotationTree(JCas, Annotation) will give you the tree for the whole
>> sentence containing the annotation you give it
>> annotationNode(JCas, Annotation) will give you the smallest subtree tree
>> covering the annotation you give it.
>> insertAnnotationNode(JCas, TopTreebankNode, Annotation, String) will
>> insert a node into the tree specified at the level specified by the
>> annotation with the category specified by the string. So for example if you
>> had meds as above you could then do:
>> for(MedicationMention med : meds){
>>  AnnotationTreeUtils.insertAnnotationNode(jcas,
>> AnnotationTreeUtils.getAnnotationTree(jcas, med), med, "MEDICATION")
>> }
>> which would insert a new node into every tree with the label "MEDICATION"
>> in every position where a medication was found.
>> One caveat to the above code is that these methods actually will change
>> the tree in the cas. That might be ok for some use cases but for many you
>> want to work on a tree outside the cas so that's why there is also methods:
>> getTreeCopy(JCas, TopTreebankNode)
>> getTreeCopy(JCas, TreebankNode)
>> if you use the getAnnotationTree method to obtain the tree you want, then
>> you can get a copy from these methods, then use the insert methods and do
>> something with them immediately (like print them out), without altering the
>> originals in the cas if other AEs may use them.
>> Tim
>> ________________________________________
>> From: Anirban Chakraborti [chakraborti.anirban@googlemail.com]
>> Sent: Sunday, May 11, 2014 9:15 AM
>> To: dev@ctakes.apache.org
>> Subject: Re: markable types
>> Steven,
>> Would you have any example code of tree parser so the output can be
>> arranged as per need. I mean, after successful annotation, I want to
>> extract certain concepts like medication only and arrange them in a new
>> tree so that all annotation in reference to medication concept and their
>> sources are listed together.
>> Anir
>> On Sun, May 11, 2014 at 3:55 PM, Steven Bethard <steven.bethard@gmail.com
>>> wrote:
>>> I don't think "not something anyone would want extracted" should be an
>>> argument against anything. We already have constituent and dependency
>>> parse trees in the type system, and those would fall under that
>>> category.
>>> So +1 on markables in the type system. (In general, +1 on moving
>>> module-specific types to the standard type system. I'm not sure what
>>> the real benefit of splitting them out is...)
>>> Steve
>>> On Fri, May 9, 2014 at 11:53 AM, Miller, Timothy
>>> <Timothy.Miller@childrens.harvard.edu> wrote:
>>>> What do people think about taking the "markable" types out of the
>>>> coreference project and adding them to the standard type system? This
>> is
>>>> a pretty standard concept in coreference that doesn't really have a
>>>> great natural representation in the current type system -- it
>>>> encompasses IdentifiedAnnotations as well as pronouns ("It", "him",
>>>> "her") and some determiners ("this").
>>>> The drawback I can see is that it is probably not something anyone
>> would
>>>> want extracted -- ultimately you want the actual coref pairs or chains.
>>>> But it is useful for things like representing gold standard input or
>>>> splitting coreference resolution into separate markable recognition and
>>>> relation classification steps.
>>>> Tim

View raw message