ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miller, Timothy" <Timothy.Mil...@childrens.harvard.edu>
Subject lvg entries
Date Thu, 17 Apr 2014 16:27:22 GMT
The LVG annotator creates an enormous number of "lemmas" for every
WordToken in the CAS, and I'm wondering what the original purpose was? I
think this is probably a minor bottleneck for speed but mostly a pretty
big space hog (at least 50% of the space of xmi files in my tests).

As of right now I'm not sure if any downstream components are using
these lemmas, and on a manual inspection the precision seems to be
pretty abysmal (meaning most of them are nonsensical as lexical
variants), so as I said, just wondering if we can revisit why cTAKES
generates so many and whether that component can be optimized.


View raw message