ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From andy mcmurry <mcmurry.a...@gmail.com>
Subject Re: lvg entries
Date Fri, 18 Apr 2014 22:14:54 GMT
+1 false ... I think

I just wonder what side effects there might be to tweaking LVG


On Fri, Apr 18, 2014 at 11:56 AM, Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> +1 false
>
> -----Original Message-----
> From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu]
> Sent: Friday, April 18, 2014 2:54 PM
> To: dev@ctakes.apache.org
> Subject: Re: lvg entries
>
> Thanks for tracking that down Andy.
>
> I am making a pass at UimaFit-izing the configuration parameters for all
> the annotators in the default pipeline, before I create the static factory
> methods like we recently discussed. Should I go ahead and change this to
> make default behavior be false?
>
> Tim
>
>
> On 04/18/2014 12:47 AM, andy mcmurry wrote:
> > There is a lot of config handling, maybe PostLemmas is being set to
> > true or
> > configInit() is not setting up  the NLM wrapper incorrectly.
> >
> > ctakes-lvg *README*
> > Note: as distributed, PostLemmas is set to false.  This is done to
> > reduce the size of the CAS.
> > Set PostLemmas to true to have org.apache.ctakes.typesystem.type.Lemma
> > annotations added to the CAS.
> >
> > *LvgAnnotator.xml *
> > PostLemmas = True
> >
> > *LvgAnnotator.java*
> > if (postLemmas) {
> >      lvgResource.getLvgLex()
> > }
> >
> >
> >
> >
> >
> >
> >
> > On Thu, Apr 17, 2014 at 3:23 PM, Masanz, James J. <Masanz.James@mayo.edu
> >wrote:
> >
> >> The normalizedForm field is filled in. It is used by dictionary lookup.
> >>
> >> So, for example, if the dictionary would contain "lymph node" but not
> >> "lymph nodes", a document with text of "lymph nodes" would match the
> >> dictionary entry "lymph node" because "node", being the normalized
> >> form of "nodes", would be used when searching dictionary entries (in
> >> addition to searching dictionary entries for "nodes")
> >>
> >> -----Original Message-----
> >> From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu]
> >> Sent: Thursday, April 17, 2014 4:33 PM
> >> To: dev@ctakes.apache.org
> >> Subject: Re: lvg entries
> >>
> >> Quick follow-up since I was interested. The current dependency parser
> >> does have the option to use ctakes lemmas or do its own lemmatizing,
> >> but that doesn't use the lemma field, it uses the normalizedForm
> >> field. I'm not sure if that field is actually ever filled in -- on my
> >> example data it is always null.
> >>
> >> Tim
> >>
> >> On 04/17/2014 01:57 PM, Masanz, James J. wrote:
> >>> Offhand I recall at least one of the dependency parsers used the
> >>> Lemma
> >> annotations at one point.
> >>> Not sure if still does.
> >>>
> >>> There is an option for turning off the posting of the lemmas to the
> cas.
> >>>
> >>> Hope that helps
> >>>
> >>> -----Original Message-----
> >>> From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu]
> >>> Sent: Thursday, April 17, 2014 11:27 AM
> >>> To: dev@ctakes.apache.org
> >>> Subject: lvg entries
> >>>
> >>> The LVG annotator creates an enormous number of "lemmas" for every
> >>> WordToken in the CAS, and I'm wondering what the original purpose
> >>> was? I think this is probably a minor bottleneck for speed but
> >>> mostly a pretty big space hog (at least 50% of the space of xmi files
> in my tests).
> >>>
> >>> As of right now I'm not sure if any downstream components are using
> >>> these lemmas, and on a manual inspection the precision seems to be
> >>> pretty abysmal (meaning most of them are nonsensical as lexical
> >>> variants), so as I said, just wondering if we can revisit why cTAKES
> >>> generates so many and whether that component can be optimized.
> >>>
> >>> Thanks
> >>> Tim
> >>>
> >>>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message