ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Masanz, James J." <Masanz.Ja...@mayo.edu>
Subject RE: lvg entries
Date Thu, 17 Apr 2014 17:57:02 GMT

Offhand I recall at least one of the dependency parsers used the Lemma annotations at one
Not sure if still does.

There is an option for turning off the posting of the lemmas to the cas.

Hope that helps

-----Original Message-----
From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu] 
Sent: Thursday, April 17, 2014 11:27 AM
To: dev@ctakes.apache.org
Subject: lvg entries

The LVG annotator creates an enormous number of "lemmas" for every
WordToken in the CAS, and I'm wondering what the original purpose was? I
think this is probably a minor bottleneck for speed but mostly a pretty
big space hog (at least 50% of the space of xmi files in my tests).

As of right now I'm not sure if any downstream components are using
these lemmas, and on a manual inspection the precision seems to be
pretty abysmal (meaning most of them are nonsensical as lexical
variants), so as I said, just wondering if we can revisit why cTAKES
generates so many and whether that component can be optimized.


View raw message