ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen, Pei" <Pei.C...@childrens.harvard.edu>
Subject RE: ClearNLP POSTagger
Date Mon, 08 Apr 2013 19:21:17 GMT
Okay, 
I'll commit the ClearPOSTagger and make it available in the ctakes-pos-tagger component, but
leave everything as they currently are (currently default to OpenNLP).
We can always switch one or the other in the future (when there is a fair comparison/benchmark).

Note: I think there is a pretty significant speed improvement in the ClearPOSTagger as well.

> -----Original Message-----
> From: Lee Becker [mailto:lee.becker@gmail.com]
> Sent: Monday, April 08, 2013 2:29 PM
> To: dev@ctakes.apache.org
> Subject: Re: ClearNLP POSTagger
> 
> On Mon, Apr 8, 2013 at 12:04 PM, Steven Bethard
> <steven.bethard@colorado.edu
> > wrote:
> 
> > > While working on the Dependency Parser/SRL labeler,  we also have a
> > POSTagger from ClearNLP.  It is fairly simple and I have the code
> > ready (also trained on the same data as the dep parser- MiPaq/SHARP)
> > to be checked-in.  What does the folks think:
> > > We can include both Analysis Engines in the ctakes-pos-tagger project.
> >  But should we leave the current OpenNLP in the default pipeline or
> > default to the latest?
> >
> > My vote would be to default for whatever has the best performance.
> > Presumably the ClearNLP one?
> >
> > > "The ClearNLP POS tagger shows more robust results on unknown words
> > > by
> > generalizing lexical features.
> >
> > Looking at the paper, ClearNLP POS tagger is not compared directly to
> > the cTAKES OpenNLP POS tagger, but they do outperform the Stanford
> > tagger trained on the same data, so it's probably a reasonable guess
> > that they're more accurate than the OpenNLP tagger.
> >
> > > It also uses AdaGrad for machine learning, which is a more advanced
> > learning algorithm than maximum entropy used by OpenNLP."
> >
> > My opinion is that we should never include a model in cTAKES just
> > because it has a "more advanced learning algorithm". "More advanced
> > learning algorithm" does not always translate into better performance.
> 
> 
> If my memory is serving me correctly, I think Jinho trained his parsers off of
> predicted POS tags to get eke out the extra performance.  The takeaway
> being that ClearNLP does better when you can use as much of its pipeline as
> possible.

Mime
View raw message