ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dligach, Dmitriy" <Dmitriy.Dlig...@childrens.harvard.edu>
Subject Re: lvg entries
Date Thu, 17 Apr 2014 17:14:25 GMT
I don’t know of any applications within cTAKES that make use of this… The reverse (mapping
from these “variants” to the normal form) may be useful though.

Dima




On Apr 17, 2014, at 11:50, Miller, Timothy <Timothy.Miller@childrens.harvard.edu> wrote:

> Sure, just as an example, I gave it a note with about 1000 words. It
> generates 11500 NonEmptyFSList elements (each is basically one lexical
> variant).
> 
> For the word "symptomatic", these are the first 10 of 20 lexical variants:
> Symptomaticer/JJ
> Symptomaticer/RB
> Symptomaticed/VB
> Symptomaticcing/VB
> Symptomatics/VB
> Symptomatics/NN
> Symptomaticked/VB
> Symptomatic/VB
> Symptomatic/JJ
> Symptomatic/RB
> 
> Tim
> 
> 
> On 04/17/2014 12:31 PM, Dligach, Dmitriy wrote:
>> Tim, this is a very interesting observation. Could you please send a few examples
of what LVG generates? Both sensical and non :)
>> 
>> Dima
>> 
>> 
>> 
>> 
>> On Apr 17, 2014, at 11:28, Miller, Timothy <Timothy.Miller@childrens.harvard.edu>
wrote:
>> 
>>> The LVG annotator creates an enormous number of "lemmas" for every
>>> WordToken in the CAS, and I'm wondering what the original purpose was? I
>>> think this is probably a minor bottleneck for speed but mostly a pretty
>>> big space hog (at least 50% of the space of xmi files in my tests).
>>> 
>>> As of right now I'm not sure if any downstream components are using
>>> these lemmas, and on a manual inspection the precision seems to be
>>> pretty abysmal (meaning most of them are nonsensical as lexical
>>> variants), so as I said, just wondering if we can revisit why cTAKES
>>> generates so many and whether that component can be optimized.
>>> 
>>> Thanks
>>> Tim
>>> 
>> 
> 


Mime
View raw message