ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen, Pei" <Pei.C...@childrens.harvard.edu>
Subject RE: ClearNLP POSTagger
Date Tue, 09 Apr 2013 20:42:38 GMT
Thanks James.
Good idea,
It's been moved to a clearnlp folder now (to indicate that it's a clearnlp model).
org/apache/ctakes/postagger/models/clearnlp/mayo-en-pos-1.3.0.jar

Let me know if you get a chance to try it out/run some benchmarks see how it performs against
the current.

--Pei

> -----Original Message-----
> From: Masanz, James J. [mailto:Masanz.James@mayo.edu]
> Sent: Tuesday, April 09, 2013 4:31 PM
> To: 'dev@ctakes.apache.org'
> Subject: RE: ClearNLP POSTagger
> 
> That's great. Thanks.
> 
> Is there something that describes which model to use for which AE.
> Or maybe put something in the model filename, or put the model in a
> separate subdirectory?
> 
> -- James
> 
> 
> > -----Original Message-----
> > From: dev-return-1482-Masanz.James=mayo.edu@ctakes.apache.org
> > [mailto:dev- return-1482-Masanz.James=mayo.edu@ctakes.apache.org]
> On
> > Behalf Of Chen, Pei
> > Sent: Tuesday, April 09, 2013 3:29 PM
> > To: dev@ctakes.apache.org
> > Subject: RE: ClearNLP POSTagger
> >
> > FYI:
> > This has been done in trunk in r. 1466216
> > https://issues.apache.org/jira/browse/CTAKES-186
> > If you would like to try it out or run some benchmarks before we
> > decide if we should make the default pipeline use this, just uncomment
> > the below in your Aggregate Descriptors.
> >
> > <delegateAnalysisEngine key="ClearPOSTagger"> <import
> > location="../../../ctakes-pos-tagger/desc/ClearNLPPOSTagger.xml"/>
> > </delegateAnalysisEngine>
> > <node>ClearPOSTagger</node>
> >
> > > -----Original Message-----
> > > From: Chen, Pei [mailto:Pei.Chen@childrens.harvard.edu]
> > > Sent: Monday, April 08, 2013 5:14 PM
> > > To: dev@ctakes.apache.org
> > > Subject: RE: ClearNLP POSTagger
> > >
> > > Hi Richard,
> > > Yes- the ClearNLP tools (POSTagger, Dependency Parser, SRL) in
> > > cTAKES were retrained with additional data (MiPAQ/SHARP).
> > > The Dependency Parser/SRL replaced the existing one because the old
> > > ClearParser ones were no longer supported.
> > >
> > > The ClearPOSTagger wasn't previously available in cTAKES, but we can
> > > certainly make it an optional one in case some folks may want to use
> > > it.  I'll leave the default one (OpenNLP) as-is for the time being
> > > until we get some more users/tests/benchmarks/feedback...
> > >
> > > --Pei
> > >
> > > > -----Original Message-----
> > > > From: Richard Eckart de Castilho [mailto:eckart@ukp.informatik.tu-
> > > > darmstadt.de]
> > > > Sent: Monday, April 08, 2013 1:43 PM
> > > > To: <dev@ctakes.apache.org>
> > > > Subject: Re: ClearNLP POSTagger
> > > >
> > > > Hi,
> > > >
> > > > did you train new models for the ClearNLP/OpenNLP tools? (Maybe I
> > > > knew if I had followed a past discussion on models more closely.)
> > > >
> > > > Cheers,
> > > >
> > > > -- Richard
> > > >
> > > > Am 08.04.2013 um 18:15 schrieb "Chen, Pei"
> > > > <Pei.Chen@childrens.harvard.edu>:
> > > >
> > > > > Hi,
> > > > > While working on the Dependency Parser/SRL labeler,  we also
> > > > > have a
> > > > POSTagger from ClearNLP.  It is fairly simple and I have the code
> > > > ready (also trained on the same data as the dep parser-
> > > > MiPaq/SHARP) to
> > > be checked-in.
> > > > What does the folks think:
> > > > > We can include both Analysis Engines in the ctakes-pos-tagger
> > > > > project.  But
> > > > should we leave the current OpenNLP in the default pipeline or
> > > > default to the latest?
> > > > >
> > > > > "The ClearNLP POS tagger shows more robust results on unknown
> > > > > words
> > > > by generalizing lexical features.  You can find the reference from
> > this paper.
> > > > > Fast and Robust Part-of-Speech Tagging Using Dynamic Model
> > > > > Selection,
> > > > Jinho D. Choi, Martha Palmer, Proceedings of the 50th Annual
> > > > Meeting of the Association for Computational Linguistics (ACL'12),
> > > > 363-367, Jeju,
> > > Korea, 2012.
> > > > [1] It also uses AdaGrad for machine learning, which is a more
> > > > advanced learning algorithm than maximum entropy used by
> OpenNLP."
> > > > >
> > > > > [1] http://aclweb.org/anthology-new/P/P12/P12-2071.pdf
> > > >
> > > >
> > > > --
> > > > ------------------------------------------------------------------
> > > > -
> > > > Richard Eckart de Castilho
> > > > Technical Lead
> > > > Ubiquitous Knowledge Processing Lab (UKP-TUD) FB 20 Computer
> > > > Science Department Technische Universit├Ąt Darmstadt Hochschulstr.
> > > > 10,
> > > > D-64289 Darmstadt, Germany phone [+49] (0)6151 16-7477, fax -5455,
> > > > room
> > > > S2/02/B117 eckart@ukp.informatik.tu-darmstadt.de
> > > > www.ukp.tu-darmstadt.de
> > > > Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de
> > > > ------------------------------------------------------------------
> > > > -


Mime
View raw message