ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Kottmann <kottm...@gmail.com>
Subject Re: ClearNLP POSTagger
Date Tue, 09 Apr 2013 08:11:20 GMT
Would it be possible to run some benchmarks so we know the performance 
difference between the two?

The OpenNLP POS Tagger can be customized, currently is possible to 
replace the feature generation,
it can probably be optimized for the medical domain, the default feature 
generation is tuned for the news domain.
Replacing the learning algorithm is currently not possible, but we will 
work on that for the next release.

Do you use a tag dictionary? Maybe it is possible to generate something 
from the existing dictionaries already
used by cTAKES.

Jörn

On 04/08/2013 06:15 PM, Chen, Pei wrote:
> Hi,
> While working on the Dependency Parser/SRL labeler,  we also have a POSTagger from ClearNLP.
 It is fairly simple and I have the code ready (also trained on the same data as the dep parser-
MiPaq/SHARP) to be checked-in.  What does the folks think:
> We can include both Analysis Engines in the ctakes-pos-tagger project.  But should we
leave the current OpenNLP in the default pipeline or default to the latest?
>
> "The ClearNLP POS tagger shows more robust results on unknown words by generalizing lexical
features.  You can find the reference from this paper.
> Fast and Robust Part-of-Speech Tagging Using Dynamic Model Selection, Jinho D. Choi,
Martha Palmer, Proceedings of the 50th Annual Meeting of the Association for Computational
Linguistics (ACL'12), 363-367, Jeju, Korea, 2012. [1] It also uses AdaGrad for machine learning,
which is a more advanced learning algorithm than maximum entropy used by OpenNLP."
>
> [1] http://aclweb.org/anthology-new/P/P12/P12-2071.pdf
>


Mime
View raw message