ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miller, Timothy" <Timothy.Mil...@childrens.harvard.edu>
Subject Re: lvg entries
Date Thu, 17 Apr 2014 20:34:22 GMT
Thanks James. Does it ring a bell to you that the original intention was
something like query expansion for a dictionary lookup?
Tim


On 04/17/2014 01:57 PM, Masanz, James J. wrote:
> Offhand I recall at least one of the dependency parsers used the Lemma annotations at
one point.
> Not sure if still does.
>
> There is an option for turning off the posting of the lemmas to the cas.
>
> Hope that helps
>
> -----Original Message-----
> From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu] 
> Sent: Thursday, April 17, 2014 11:27 AM
> To: dev@ctakes.apache.org
> Subject: lvg entries
>
> The LVG annotator creates an enormous number of "lemmas" for every
> WordToken in the CAS, and I'm wondering what the original purpose was? I
> think this is probably a minor bottleneck for speed but mostly a pretty
> big space hog (at least 50% of the space of xmi files in my tests).
>
> As of right now I'm not sure if any downstream components are using
> these lemmas, and on a manual inspection the precision seems to be
> pretty abysmal (meaning most of them are nonsensical as lexical
> variants), so as I said, just wondering if we can revisit why cTAKES
> generates so many and whether that component can be optimized.
>
> Thanks
> Tim
>
>


Mime
View raw message