ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Bethard <steven.beth...@Colorado.EDU>
Subject Re: ClearNLP POSTagger
Date Mon, 08 Apr 2013 18:04:37 GMT
On Apr 8, 2013, at 10:15 AM, "Chen, Pei" <Pei.Chen@childrens.harvard.edu> wrote:
> While working on the Dependency Parser/SRL labeler,  we also have a POSTagger from ClearNLP.
 It is fairly simple and I have the code ready (also trained on the same data as the dep parser-
MiPaq/SHARP) to be checked-in.  What does the folks think:
> We can include both Analysis Engines in the ctakes-pos-tagger project.  But should we
leave the current OpenNLP in the default pipeline or default to the latest?

My vote would be to default for whatever has the best performance. Presumably the ClearNLP
one?

> "The ClearNLP POS tagger shows more robust results on unknown words by generalizing lexical
features.

Looking at the paper, ClearNLP POS tagger is not compared directly to the cTAKES OpenNLP POS
tagger, but they do outperform the Stanford tagger trained on the same data, so it's probably
a reasonable guess that they're more accurate than the OpenNLP tagger.

> It also uses AdaGrad for machine learning, which is a more advanced learning algorithm
than maximum entropy used by OpenNLP."

My opinion is that we should never include a model in cTAKES just because it has a "more advanced
learning algorithm". "More advanced learning algorithm" does not always translate into better
performance.

Steve
Mime
View raw message