ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miller, Timothy" <Timothy.Mil...@childrens.harvard.edu>
Subject Re: lvg entries
Date Fri, 18 Apr 2014 18:52:52 GMT
Thanks for tracking that down Andy.

I am making a pass at UimaFit-izing the configuration parameters for all
the annotators in the default pipeline, before I create the static
factory methods like we recently discussed. Should I go ahead and change
this to make default behavior be false?

Tim


On 04/18/2014 12:47 AM, andy mcmurry wrote:
> There is a lot of config handling, maybe PostLemmas is being set to true or
> configInit() is not setting up  the NLM wrapper incorrectly.
>
> ctakes-lvg *README*
> Note: as distributed, PostLemmas is set to false.  This is done to reduce
> the size of the CAS.
> Set PostLemmas to true to have org.apache.ctakes.typesystem.type.Lemma
> annotations added to the CAS.
>
> *LvgAnnotator.xml *
> PostLemmas = True
>
> *LvgAnnotator.java*
> if (postLemmas) {
>      lvgResource.getLvgLex()
> }
>
>
>
>
>
>
>
> On Thu, Apr 17, 2014 at 3:23 PM, Masanz, James J. <Masanz.James@mayo.edu>wrote:
>
>> The normalizedForm field is filled in. It is used by dictionary lookup.
>>
>> So, for example, if the dictionary would contain "lymph node" but not
>> "lymph nodes", a document with text of "lymph nodes" would match the
>> dictionary entry "lymph node" because "node", being the normalized form of
>> "nodes", would be used when searching dictionary entries (in addition to
>> searching dictionary entries for "nodes")
>>
>> -----Original Message-----
>> From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu]
>> Sent: Thursday, April 17, 2014 4:33 PM
>> To: dev@ctakes.apache.org
>> Subject: Re: lvg entries
>>
>> Quick follow-up since I was interested. The current dependency parser
>> does have the option to use ctakes lemmas or do its own lemmatizing, but
>> that doesn't use the lemma field, it uses the normalizedForm field. I'm
>> not sure if that field is actually ever filled in -- on my example data
>> it is always null.
>>
>> Tim
>>
>> On 04/17/2014 01:57 PM, Masanz, James J. wrote:
>>> Offhand I recall at least one of the dependency parsers used the Lemma
>> annotations at one point.
>>> Not sure if still does.
>>>
>>> There is an option for turning off the posting of the lemmas to the cas.
>>>
>>> Hope that helps
>>>
>>> -----Original Message-----
>>> From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu]
>>> Sent: Thursday, April 17, 2014 11:27 AM
>>> To: dev@ctakes.apache.org
>>> Subject: lvg entries
>>>
>>> The LVG annotator creates an enormous number of "lemmas" for every
>>> WordToken in the CAS, and I'm wondering what the original purpose was? I
>>> think this is probably a minor bottleneck for speed but mostly a pretty
>>> big space hog (at least 50% of the space of xmi files in my tests).
>>>
>>> As of right now I'm not sure if any downstream components are using
>>> these lemmas, and on a manual inspection the precision seems to be
>>> pretty abysmal (meaning most of them are nonsensical as lexical
>>> variants), so as I said, just wondering if we can revisit why cTAKES
>>> generates so many and whether that component can be optimized.
>>>
>>> Thanks
>>> Tim
>>>
>>>
>>


Mime
View raw message