ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Masanz, James J." <Masanz.Ja...@mayo.edu>
Subject RE: lvg entries
Date Thu, 17 Apr 2014 22:16:58 GMT

Before the switch to OpenNLP (which was done before the first opensource release of cTAKES),
I believe the Lemma annotations were used by the POS tagger and/or phrasal parser.  As far
as I know, that was the original intention of the Lemmas. I believe they were turned off by
default for some releases, until someone started to use them (or at least look at maybe using
them)

That's all just from memory. We'd have to look through histories to see when things changed.

I don't think the Lemma annotations were ever used for dictionary lookup. That used the (single)
output of the normalizer function of the LVG component

-----Original Message-----
From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu] 
Sent: Thursday, April 17, 2014 3:34 PM
To: dev@ctakes.apache.org
Subject: Re: lvg entries

Thanks James. Does it ring a bell to you that the original intention was
something like query expansion for a dictionary lookup?
Tim


On 04/17/2014 01:57 PM, Masanz, James J. wrote:
> Offhand I recall at least one of the dependency parsers used the Lemma annotations at
one point.
> Not sure if still does.
>
> There is an option for turning off the posting of the lemmas to the cas.
>
> Hope that helps
>
> -----Original Message-----
> From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu] 
> Sent: Thursday, April 17, 2014 11:27 AM
> To: dev@ctakes.apache.org
> Subject: lvg entries
>
> The LVG annotator creates an enormous number of "lemmas" for every
> WordToken in the CAS, and I'm wondering what the original purpose was? I
> think this is probably a minor bottleneck for speed but mostly a pretty
> big space hog (at least 50% of the space of xmi files in my tests).
>
> As of right now I'm not sure if any downstream components are using
> these lemmas, and on a manual inspection the precision seems to be
> pretty abysmal (meaning most of them are nonsensical as lexical
> variants), so as I said, just wondering if we can revisit why cTAKES
> generates so many and whether that component can be optimized.
>
> Thanks
> Tim
>
>


Mime
View raw message