ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lee Becker <lee.bec...@gmail.com>
Subject Re: ClearNLP POSTagger
Date Mon, 08 Apr 2013 18:28:16 GMT
On Mon, Apr 8, 2013 at 12:04 PM, Steven Bethard <steven.bethard@colorado.edu
> wrote:

> > While working on the Dependency Parser/SRL labeler,  we also have a
> POSTagger from ClearNLP.  It is fairly simple and I have the code ready
> (also trained on the same data as the dep parser- MiPaq/SHARP) to be
> checked-in.  What does the folks think:
> > We can include both Analysis Engines in the ctakes-pos-tagger project.
>  But should we leave the current OpenNLP in the default pipeline or default
> to the latest?
> My vote would be to default for whatever has the best performance.
> Presumably the ClearNLP one?
> > "The ClearNLP POS tagger shows more robust results on unknown words by
> generalizing lexical features.
> Looking at the paper, ClearNLP POS tagger is not compared directly to the
> cTAKES OpenNLP POS tagger, but they do outperform the Stanford tagger
> trained on the same data, so it's probably a reasonable guess that they're
> more accurate than the OpenNLP tagger.
> > It also uses AdaGrad for machine learning, which is a more advanced
> learning algorithm than maximum entropy used by OpenNLP."
> My opinion is that we should never include a model in cTAKES just because
> it has a "more advanced learning algorithm". "More advanced learning
> algorithm" does not always translate into better performance.

If my memory is serving me correctly, I think Jinho trained his parsers off
of predicted POS tags to get eke out the extra performance.  The takeaway
being that ClearNLP does better when you can use as much of its pipeline as

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message