ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen, Pei" <Pei.C...@childrens.harvard.edu>
Subject RE: ClearNLP POSTagger
Date Mon, 08 Apr 2013 21:12:51 GMT
Hi Richard,
Yes- the ClearNLP tools (POSTagger, Dependency Parser, SRL) in cTAKES were retrained with
additional data (MiPAQ/SHARP).  
The Dependency Parser/SRL replaced the existing one because the old ClearParser ones were
no longer supported.

The ClearPOSTagger wasn't previously available in cTAKES, but we can certainly make it an
optional one in case some folks may want to use it.  I'll leave the default one (OpenNLP)
as-is for the time being until we get some more users/tests/benchmarks/feedback...

--Pei

> -----Original Message-----
> From: Richard Eckart de Castilho [mailto:eckart@ukp.informatik.tu-
> darmstadt.de]
> Sent: Monday, April 08, 2013 1:43 PM
> To: <dev@ctakes.apache.org>
> Subject: Re: ClearNLP POSTagger
> 
> Hi,
> 
> did you train new models for the ClearNLP/OpenNLP tools? (Maybe I knew if
> I had followed a past discussion on models more closely.)
> 
> Cheers,
> 
> -- Richard
> 
> Am 08.04.2013 um 18:15 schrieb "Chen, Pei"
> <Pei.Chen@childrens.harvard.edu>:
> 
> > Hi,
> > While working on the Dependency Parser/SRL labeler,  we also have a
> POSTagger from ClearNLP.  It is fairly simple and I have the code ready (also
> trained on the same data as the dep parser- MiPaq/SHARP) to be checked-in.
> What does the folks think:
> > We can include both Analysis Engines in the ctakes-pos-tagger project.  But
> should we leave the current OpenNLP in the default pipeline or default to
> the latest?
> >
> > "The ClearNLP POS tagger shows more robust results on unknown words
> by generalizing lexical features.  You can find the reference from this paper.
> > Fast and Robust Part-of-Speech Tagging Using Dynamic Model Selection,
> Jinho D. Choi, Martha Palmer, Proceedings of the 50th Annual Meeting of the
> Association for Computational Linguistics (ACL'12), 363-367, Jeju, Korea, 2012.
> [1] It also uses AdaGrad for machine learning, which is a more advanced
> learning algorithm than maximum entropy used by OpenNLP."
> >
> > [1] http://aclweb.org/anthology-new/P/P12/P12-2071.pdf
> 
> 
> --
> -------------------------------------------------------------------
> Richard Eckart de Castilho
> Technical Lead
> Ubiquitous Knowledge Processing Lab (UKP-TUD)
> FB 20 Computer Science Department
> Technische Universit├Ąt Darmstadt
> Hochschulstr. 10, D-64289 Darmstadt, Germany phone [+49] (0)6151 16-7477,
> fax -5455, room S2/02/B117 eckart@ukp.informatik.tu-darmstadt.de
> www.ukp.tu-darmstadt.de
> Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de
> -------------------------------------------------------------------


Mime
View raw message