ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miller, Timothy" <Timothy.Mil...@childrens.harvard.edu>
Subject Re: lvg entries
Date Thu, 17 Apr 2014 16:50:22 GMT
Sure, just as an example, I gave it a note with about 1000 words. It
generates 11500 NonEmptyFSList elements (each is basically one lexical
variant).

For the word "symptomatic", these are the first 10 of 20 lexical variants:
Symptomaticer/JJ
Symptomaticer/RB
Symptomaticed/VB
Symptomaticcing/VB
Symptomatics/VB
Symptomatics/NN
Symptomaticked/VB
Symptomatic/VB
Symptomatic/JJ
Symptomatic/RB

Tim


On 04/17/2014 12:31 PM, Dligach, Dmitriy wrote:
> Tim, this is a very interesting observation. Could you please send a few examples of
what LVG generates? Both sensical and non :)
>
> Dima
>
>
>
>
> On Apr 17, 2014, at 11:28, Miller, Timothy <Timothy.Miller@childrens.harvard.edu>
wrote:
>
>> The LVG annotator creates an enormous number of "lemmas" for every
>> WordToken in the CAS, and I'm wondering what the original purpose was? I
>> think this is probably a minor bottleneck for speed but mostly a pretty
>> big space hog (at least 50% of the space of xmi files in my tests).
>>
>> As of right now I'm not sure if any downstream components are using
>> these lemmas, and on a manual inspection the precision seems to be
>> pretty abysmal (meaning most of them are nonsensical as lexical
>> variants), so as I said, just wondering if we can revisit why cTAKES
>> generates so many and whether that component can be optimized.
>>
>> Thanks
>> Tim
>>
>


Mime
View raw message