Return-Path: X-Original-To: apmail-ctakes-dev-archive@www.apache.org Delivered-To: apmail-ctakes-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6004910887 for ; Mon, 8 Apr 2013 21:13:37 +0000 (UTC) Received: (qmail 97863 invoked by uid 500); 8 Apr 2013 21:13:37 -0000 Delivered-To: apmail-ctakes-dev-archive@ctakes.apache.org Received: (qmail 97807 invoked by uid 500); 8 Apr 2013 21:13:37 -0000 Mailing-List: contact dev-help@ctakes.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ctakes.apache.org Delivered-To: mailing list dev@ctakes.apache.org Received: (qmail 97798 invoked by uid 99); 8 Apr 2013 21:13:37 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Apr 2013 21:13:37 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: error (athena.apache.org: local policy) Received: from [134.174.13.91] (HELO mailsmtp1.childrenshospital.org) (134.174.13.91) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Apr 2013 21:13:33 +0000 Received: from pps.filterd (mailsmtp1.childrenshospital.org [127.0.0.1]) by mailsmtp1.childrenshospital.org (8.14.5/8.14.5) with SMTP id r38LCnPA008771 for ; Mon, 8 Apr 2013 17:12:52 -0400 Received: from smtpndc1.chboston.org (smtpndc1.chboston.org [10.20.50.104]) by mailsmtp1.childrenshospital.org with ESMTP id 1bm06e22jp-1 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NOT) for ; Mon, 08 Apr 2013 17:12:52 -0400 Received: from pps.filterd (smtpndc1.chboston.org [127.0.0.1]) by smtpndc1.chboston.org (8.14.5/8.14.5) with SMTP id r38LCYOA007356 for ; Mon, 8 Apr 2013 17:12:51 -0400 Received: from chexhubcas1.chboston.org (internal-ndc-nat-v1260.tch.harvard.edu [10.20.50.4]) by smtpndc1.chboston.org with ESMTP id 1bhcb71v8k-1 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT) for ; Mon, 08 Apr 2013 17:12:51 -0400 Received: from CHEXMBX1A.CHBOSTON.ORG ([fe80::3c05:8ca9:55a6:f320]) by CHEXHUBCAS1.CHBOSTON.ORG ([::1]) with mapi id 14.02.0309.002; Mon, 8 Apr 2013 17:12:51 -0400 From: "Chen, Pei" To: "dev@ctakes.apache.org" Subject: RE: ClearNLP POSTagger Thread-Topic: ClearNLP POSTagger Thread-Index: Ac40c4Z3shxWjz1UQE6yYhmWViyLAf//+EwA//+k6AA= Date: Mon, 8 Apr 2013 21:12:51 +0000 Message-ID: <924DE05C19409B438EB81DE683A942D9104D37FD@CHEXMBX1A.CHBOSTON.ORG> References: <924DE05C19409B438EB81DE683A942D9104D33F7@CHEXMBX1A.CHBOSTON.ORG> <3563F226-D0D7-4976-9078-4684178BBD02@ukp.informatik.tu-darmstadt.de> In-Reply-To: <3563F226-D0D7-4976-9078-4684178BBD02@ukp.informatik.tu-darmstadt.de> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.7.2.182] Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.10.8626,1.0.431,0.0.0000 definitions=2013-04-08_03:2013-04-08,2013-04-08,1970-01-01 signatures=0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.10.8626,1.0.431,0.0.0000 definitions=2013-04-08_03:2013-04-08,2013-04-08,1970-01-01 signatures=0 X-Virus-Checked: Checked by ClamAV on apache.org Hi Richard, Yes- the ClearNLP tools (POSTagger, Dependency Parser, SRL) in cTAKES were = retrained with additional data (MiPAQ/SHARP). =20 The Dependency Parser/SRL replaced the existing one because the old ClearPa= rser ones were no longer supported. The ClearPOSTagger wasn't previously available in cTAKES, but we can certai= nly make it an optional one in case some folks may want to use it. I'll le= ave the default one (OpenNLP) as-is for the time being until we get some mo= re users/tests/benchmarks/feedback... --Pei > -----Original Message----- > From: Richard Eckart de Castilho [mailto:eckart@ukp.informatik.tu- > darmstadt.de] > Sent: Monday, April 08, 2013 1:43 PM > To: > Subject: Re: ClearNLP POSTagger >=20 > Hi, >=20 > did you train new models for the ClearNLP/OpenNLP tools? (Maybe I knew if > I had followed a past discussion on models more closely.) >=20 > Cheers, >=20 > -- Richard >=20 > Am 08.04.2013 um 18:15 schrieb "Chen, Pei" > : >=20 > > Hi, > > While working on the Dependency Parser/SRL labeler, we also have a > POSTagger from ClearNLP. It is fairly simple and I have the code ready (= also > trained on the same data as the dep parser- MiPaq/SHARP) to be checked-in= . > What does the folks think: > > We can include both Analysis Engines in the ctakes-pos-tagger project. = But > should we leave the current OpenNLP in the default pipeline or default to > the latest? > > > > "The ClearNLP POS tagger shows more robust results on unknown words > by generalizing lexical features. You can find the reference from this p= aper. > > Fast and Robust Part-of-Speech Tagging Using Dynamic Model Selection, > Jinho D. Choi, Martha Palmer, Proceedings of the 50th Annual Meeting of t= he > Association for Computational Linguistics (ACL'12), 363-367, Jeju, Korea,= 2012. > [1] It also uses AdaGrad for machine learning, which is a more advanced > learning algorithm than maximum entropy used by OpenNLP." > > > > [1] http://aclweb.org/anthology-new/P/P12/P12-2071.pdf >=20 >=20 > -- > ------------------------------------------------------------------- > Richard Eckart de Castilho > Technical Lead > Ubiquitous Knowledge Processing Lab (UKP-TUD) > FB 20 Computer Science Department > Technische Universit=E4t Darmstadt > Hochschulstr. 10, D-64289 Darmstadt, Germany phone [+49] (0)6151 16-7477, > fax -5455, room S2/02/B117 eckart@ukp.informatik.tu-darmstadt.de > www.ukp.tu-darmstadt.de > Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de > -------------------------------------------------------------------