Return-Path: X-Original-To: apmail-ctakes-dev-archive@www.apache.org Delivered-To: apmail-ctakes-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BC5CC109E5 for ; Mon, 30 Sep 2013 16:11:19 +0000 (UTC) Received: (qmail 34689 invoked by uid 500); 30 Sep 2013 16:11:18 -0000 Delivered-To: apmail-ctakes-dev-archive@ctakes.apache.org Received: (qmail 34587 invoked by uid 500); 30 Sep 2013 16:11:17 -0000 Mailing-List: contact dev-help@ctakes.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ctakes.apache.org Delivered-To: mailing list dev@ctakes.apache.org Received: (qmail 34579 invoked by uid 99); 30 Sep 2013 16:11:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Sep 2013 16:11:17 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of Pei.Chen@childrens.harvard.edu designates 134.174.13.91 as permitted sender) Received: from [134.174.13.91] (HELO mailsmtp1.childrenshospital.org) (134.174.13.91) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Sep 2013 16:11:10 +0000 Received: from pps.filterd (mailsmtp1.childrenshospital.org [127.0.0.1]) by mailsmtp1.childrenshospital.org (8.14.5/8.14.5) with SMTP id r8UG3dxe018281 for ; Mon, 30 Sep 2013 12:10:48 -0400 Received: from smtpbdc1.chboston.org (smtpbdc1.chboston.org [10.20.18.104]) by mailsmtp1.childrenshospital.org with ESMTP id 1f6prjjyx3-1 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NOT) for ; Mon, 30 Sep 2013 12:10:48 -0400 Received: from pps.filterd (smtpbdc1.chboston.org [127.0.0.1]) by smtpbdc1.chboston.org (8.14.5/8.14.5) with SMTP id r8UG4qA4006925; Mon, 30 Sep 2013 12:10:47 -0400 Received: from chexhubcas3.chboston.org (chexhubcas3.chboston.org [10.20.50.91]) by smtpbdc1.chboston.org with ESMTP id 1eukrvg9w8-1 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Mon, 30 Sep 2013 12:10:47 -0400 Received: from CHEXMBX1A.CHBOSTON.ORG ([fe80::3c05:8ca9:55a6:f320]) by CHEXHUBCAS3.CHBOSTON.ORG ([::1]) with mapi id 14.02.0342.003; Mon, 30 Sep 2013 12:10:45 -0400 From: "Chen, Pei" To: "dev@ctakes.apache.org" , samir chabou Subject: RE: sentence number in WordToken Thread-Topic: sentence number in WordToken Thread-Index: AQHOvefjt1tzxr3TCEq7EQr7uXNqMJneszcA//++DwA= Date: Mon, 30 Sep 2013 16:10:45 +0000 Message-ID: <924DE05C19409B438EB81DE683A942D9105B4950@CHEXMBX1A.CHBOSTON.ORG> References: <0807E0585CEFC14283D2E8C1DF754EAFA853EE@WN35105.or.providence.org> <0807E0585CEFC14283D2E8C1DF754EAFA85980@WN35105.or.providence.org> <408DC50FFBCBF347AF879E35BE1A70BC7CB8F6B51F@NYCWPRDADCMBX01.ahm.corp> <408DC50FFBCBF347AF879E35BE1A70BC7CB93F3600@NYCWPRDADCMBX01.ahm.corp> <1380550643.17319.YahooMailNeo@web140303.mail.bf1.yahoo.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.7.2.182] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.10.8794,1.0.431,0.0.0000 definitions=2013-09-30_01:2013-09-27,2013-09-30,1970-01-01 signatures=0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.10.8794,1.0.431,0.0.0000 definitions=2013-09-30_01:2013-09-27,2013-09-30,1970-01-01 signatures=0 X-Virus-Checked: Checked by ClamAV on apache.org Samir, I think Richard has a good point here. What is the use to require adding = sentenceNumber() to BaseToken in the TypeSystem? If it's only temporary, It may be a good idea to do it programmatically wit= h local variable rather than modifying the type system and having it stored= in the CAS...? Maybe something like: boolean a =3D JCasUtil.isCovered(JCas, BaseToken1, Sentence.class); Boolean b =3D JCasUtil.isCovered(JCas, BaseToken2, Sentence.class); --Pei > -----Original Message----- > From: Richard Eckart de Castilho [mailto:rec@apache.org] > Sent: Monday, September 30, 2013 11:59 AM > To: dev@ctakes.apache.org; samir chabou > Subject: Re: sentence number in WordToken >=20 > Hi, >=20 > if you do many selectCovering calls, you may be faster using indexCoverin= g > once and then using the lookup index it produces. >=20 > IMHO type systems should not contain information that can easily be > calculated at runtime (e.g. sentence number, token number, etc.). >=20 > Mind, I have no say here ;) Just my personal opinion. >=20 > -- Richard >=20 > On 30.09.2013, at 16:17, samir chabou wrote: >=20 > > Hi Pei, > > > > I though > > this may be have some use ... > > > > Because I > > need to know if two or more words tokens belong to the same sentence; > > and since WordToken does not define the feature sentence number. I > > added it to the TypeSystem. These are the steps: > > > > 1) I added the sentence number > > features for the type BaseToken in TypeSystem.xml file (I choose the > > supper class in order that the feature be propagated to all subclasses > > (wordToken,SymboleToken,NumToken ...) > > > > 2) In ctakes-core I in TokenizerAnnotatorPTB.java (methode > annotateRange) I set the new feature > > (BaseToken.sentenceNumber =3D sentence.getSentenceNumber()) as > shown below : > > > > bta.setSentenceNumber(sentence.getSentenceNumber()); > > bta.addToIndexes(); > > > > 3) Generate the JCASGen in the tab de TypeSystem of the > > aggregate > > > > 4) Add the feature in the source > > tab of the aggregate > > > > Probably I > > could have used as alternative: > > List list =3D JCasUtil.selectCovering(aJcas, Sentence.class, > > entity1.getBegin(), entity1.getEnd()); the issue with this is : if I > > have many entities to be checked at the same time or if the entity1 is > > found in many places, I have to add some if conditions to get sentence > > number > > > > > > Thanks > > Samir