ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From andy mcmurry <mcmurry.a...@gmail.com>
Subject Re: lvg entries
Date Fri, 18 Apr 2014 04:47:08 GMT
There is a lot of config handling, maybe PostLemmas is being set to true or
configInit() is not setting up  the NLM wrapper incorrectly.

ctakes-lvg *README*
Note: as distributed, PostLemmas is set to false.  This is done to reduce
the size of the CAS.
Set PostLemmas to true to have org.apache.ctakes.typesystem.type.Lemma
annotations added to the CAS.

*LvgAnnotator.xml *
PostLemmas = True

*LvgAnnotator.java*
if (postLemmas) {
     lvgResource.getLvgLex()
}







On Thu, Apr 17, 2014 at 3:23 PM, Masanz, James J. <Masanz.James@mayo.edu>wrote:

> The normalizedForm field is filled in. It is used by dictionary lookup.
>
> So, for example, if the dictionary would contain "lymph node" but not
> "lymph nodes", a document with text of "lymph nodes" would match the
> dictionary entry "lymph node" because "node", being the normalized form of
> "nodes", would be used when searching dictionary entries (in addition to
> searching dictionary entries for "nodes")
>
> -----Original Message-----
> From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu]
> Sent: Thursday, April 17, 2014 4:33 PM
> To: dev@ctakes.apache.org
> Subject: Re: lvg entries
>
> Quick follow-up since I was interested. The current dependency parser
> does have the option to use ctakes lemmas or do its own lemmatizing, but
> that doesn't use the lemma field, it uses the normalizedForm field. I'm
> not sure if that field is actually ever filled in -- on my example data
> it is always null.
>
> Tim
>
> On 04/17/2014 01:57 PM, Masanz, James J. wrote:
> > Offhand I recall at least one of the dependency parsers used the Lemma
> annotations at one point.
> > Not sure if still does.
> >
> > There is an option for turning off the posting of the lemmas to the cas.
> >
> > Hope that helps
> >
> > -----Original Message-----
> > From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu]
> > Sent: Thursday, April 17, 2014 11:27 AM
> > To: dev@ctakes.apache.org
> > Subject: lvg entries
> >
> > The LVG annotator creates an enormous number of "lemmas" for every
> > WordToken in the CAS, and I'm wondering what the original purpose was? I
> > think this is probably a minor bottleneck for speed but mostly a pretty
> > big space hog (at least 50% of the space of xmi files in my tests).
> >
> > As of right now I'm not sure if any downstream components are using
> > these lemmas, and on a manual inspection the precision seems to be
> > pretty abysmal (meaning most of them are nonsensical as lexical
> > variants), so as I said, just wondering if we can revisit why cTAKES
> > generates so many and whether that component can be optimized.
> >
> > Thanks
> > Tim
> >
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message