Return-Path: X-Original-To: apmail-ctakes-dev-archive@www.apache.org Delivered-To: apmail-ctakes-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7C21E104FF for ; Sat, 18 Jan 2014 03:46:24 +0000 (UTC) Received: (qmail 72308 invoked by uid 500); 18 Jan 2014 03:46:23 -0000 Delivered-To: apmail-ctakes-dev-archive@ctakes.apache.org Received: (qmail 72182 invoked by uid 500); 18 Jan 2014 03:46:14 -0000 Mailing-List: contact dev-help@ctakes.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ctakes.apache.org Delivered-To: mailing list dev@ctakes.apache.org Received: (qmail 72171 invoked by uid 99); 18 Jan 2014 03:46:09 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 18 Jan 2014 03:46:09 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of cybersation@hotmail.com designates 65.54.51.98 as permitted sender) Received: from [65.54.51.98] (HELO snt0-omc4-s47.snt0.hotmail.com) (65.54.51.98) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 18 Jan 2014 03:46:05 +0000 Received: from SNT148-W26 ([65.55.90.201]) by snt0-omc4-s47.snt0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Fri, 17 Jan 2014 19:45:44 -0800 X-TMN: [eKPzraIAZ+zwwUO39SFo5qNkWNQlMRni] X-Originating-Email: [cybersation@hotmail.com] Message-ID: Content-Type: multipart/alternative; boundary="_df16c7f5-3337-46a5-b864-1dc57299d19d_" From: digital paula To: "dev@ctakes.apache.org" Subject: RE: sentence splitter & forks/branches Date: Fri, 17 Jan 2014 22:45:44 -0500 Importance: Normal In-Reply-To: References: <5652E5352040D7429DEF7AAB8560EF041884F757@MCEXMB1.chmccorp.cchmc.org>, MIME-Version: 1.0 X-OriginalArrivalTime: 18 Jan 2014 03:45:44.0342 (UTC) FILETIME=[C4071360:01CF13FF] X-Virus-Checked: Checked by ClamAV on apache.org --_df16c7f5-3337-46a5-b864-1dc57299d19d_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable =0A= =0A= =0A= Hello again cTAKES Community=2C I thought that adding the sentence splitte= r(w/newline-sentence-continuation-recognition) would have been as simple as= it was adding the sectionizer annotator to the eclipse environment. I see= per VJ's note that it's not that simple=2C my understanding is that the st= andard clinical pipeline requires the assertion and dependency parsers. I'v= e explored a bit of the changes needed and at least for Assertion looks lik= e SentenceDetector=2C SentenceSpan=2C likely the SingleDocumentProcessor fr= om the MITRE jar will need to be modified to recognize multi-line sentences= . This is so the assertion and dependency parsers can be kept in the pipe= line. I would love to devote the time needed to fix the sentence split to = recognize sentences that are multiline but I need to focus on hacking my wa= y through the cue word issue because I've been left in the lurch with no re= sponse to my posts :-((((( =20 Regards=2C Paula =20 > Date: Wed=2C 15 Jan 2014 14:53:17 -0500 > Subject: Re: sentence splitter & forks/branches > From: vngarla@gmail.com > To: dev@ctakes.apache.org >=20 > It is unfortunately not that trivial=2C as allowing newlines within sente= nces > requires changes to the assertion and dependency parser modules. >=20 > If you're not using those AEs you could theoretically build the ytex > branch=2C and just add ctakes-ytex-uima.jar and > ctakes-ytex-uima\desc\analysis_engine\SentenceDetectorAnnotator.xml to yo= ur > exsting ctakes install (haven't tried it=2C but it should work). >=20 > -vj >=20 >=20 > On Wed=2C Jan 15=2C 2014 at 1:57 PM=2C Lingren=2C Todd wrote: >=20 > > I have a general question about forks=2C specifically the YTEX branch t= hat > > Vijay mentions. > > If I wanted to implement just the sentence splitter from YTEX into a > > currently existing 3.1 install=2C how would I do that? Is it possible? = Or do > > I have to switch over completely to run from YTEX branch? > > > > Todd Lingren > > Biomedical Informatics > > Cincinnati Children's Hospital > > Todd.Lingren@cchmc.org > > 513-803-9032 > > > > > > -----Original Message----- > > From: vijay garla [mailto:vngarla@gmail.com] > > Sent: Wednesday=2C January 15=2C 2014 11:34 AM > > To: dev@ctakes.apache.org > > Subject: Re: svn commit: r1551805 - > > /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/= assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.j= ava > > > > The issue is indeed the sentence splitter - negation is limited to word= s > > within the sentence=2C and if newlines are considered sentence boundari= es=2C it > > doesn't work properly (splitting on newlines breaks many other things a= s > > well). The YTEX branch includes a sentence splitter that does not > > automatically split sentences on newlines. > > > > best=2C > > > > vj > > > > > > On Wed=2C Jan 15=2C 2014 at 10:03 AM=2C Masanz=2C James J. > >wrote: > > > > > Hi Paula=2C > > > > > > The sentence detector in 3.1.0 and 3.1.1 (and previous releases) > > > assumes sentences don't cross line boundaries. > > > OpenNLP is used to find sentence breaks=2C but then if newlines are > > > found=2C those are also set (within cTAKES=2C not OpenNLP) to be sent= ence > > breaks. > > > > > > (just FYI I haven't had a chance to look at the ytex branch=2C which = the > > > subject commit is about) > > > > > > -- James > > > > > > -----Original Message----- > > > From: dev-return-2375-Masanz.James=3Dmayo.edu@ctakes.apache.org [mail= to: > > > dev-return-2375-Masanz.James=3Dmayo.edu@ctakes.apache.org] On Behalf = Of > > > digital paula > > > Sent: Tuesday=2C January 14=2C 2014 10:25 PM > > > To: dev@ctakes.apache.org > > > Subject: RE: svn commit: r1551805 - > > > /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctake= s > > > /assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtake= s > > > Impl.java > > > > > > > > > > > > > > > > > > > > > > > > Hello cTAKES Developer Community=2C > > > I'm a little behind on reading posts....this one is from last month. > > > I think this issue is already addressed in current release? I'm still > > > running the previous release...3.1.0. > > > I just noticed something interesting=2C the negation didn't take when= it > > > is on a different line. I just removed all carriage returns from > > narratives > > > and negation picked it up as long as it's treated as one long string. > > To > > > better explain what I mean. Two narrative comments below. > > > > > > 1. patient did not have diabetes > > > 2. patient did not have > > > diabetes > > > > > > Number 1 above got negated but number 2 did not. This might be relate= d > > > to the issue w/the sectionizer. I noticed that when I treated the > > narrative > > > as one string the sectionizer never crashes with the NPE. Well the > > > sectionizer is of no point if narrative is as one string but it's > > > helping me pinpoint the problem. > > > > > > Regards=2C > > > Paula > > > > > > > > > > Date: Thu=2C 19 Dec 2013 11:04:57 -0500 > > > > Subject: Re: FW: svn commit: r1551805 - > > > /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctake= s > > > /assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtake= s > > > Impl.java > > > > From: vngarla@gmail.com > > > > To: dev@ctakes.apache.org > > > > > > > > Hi Pei=2C > > > > > > > > I'm not sure if that would solve the problem: change in the ytex > > > > branch causes newlines to be ignored (i.e. not treated as a token). > > > > trunk's sentence splitter is splits sentences on newlines=2C so > > > > newlines would > > > never > > > > be found in a sentence. However=2C if we had a reproducer we could > > > > check > > > it > > > > fairly easily in the ytex branch. > > > > > > > > Best=2C > > > > > > > > VJ > > > > > > > > > > > > On Thu=2C Dec 19=2C 2013 at 10:15 AM=2C Chen=2C Pei > > > > wrote: > > > > > > > > > Vj=2C > > > > > Do you think this is what was causing the NPE's [1]? > > > > > If so=2C shall we make the same fix in trunk? > > > > > --Pei > > > > > > > > > > [1] > > > > > > > > http://mail-archives.apache.org/mod_mbox/ctakes-dev/201309.mbox/%3C92= 4 > > > DE05C19409B438EB81DE683A942D9105A93CB%40CHEXMBX1A.CHBOSTON.ORG%3E > > > > > > > > > > -----Original Message----- > > > > > From: vjapache@apache.org [mailto:vjapache@apache.org] > > > > > Sent: Tuesday=2C December 17=2C 2013 9:15 PM > > > > > To: commits@ctakes.apache.org > > > > > Subject: svn commit: r1551805 - > > > > > > > > /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctake= s > > > /assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtake= s > > > Impl.java > > > > > > > > > > Author: vjapache > > > > > Date: Wed Dec 18 02:14:13 2013 > > > > > New Revision: 1551805 > > > > > > > > > > URL: http://svn.apache.org/r1551805 > > > > > Log: > > > > > add support for sentences that contain newline tokens. > > > > > > > > > > Modified: > > > > > > > > > > > > > ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes= / > > > assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakes= I > > > mpl.java > > > > > > > > > > Modified: > > > > > > > > ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes= / > > > assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakes= I > > > mpl.java > > > > > URL: > > > > > > > > http://svn.apache.org/viewvc/ctakes/branches/ytex/ctakes-assertion/sr= c > > > /main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOff= s > > > etToLineTokenConverterCtakesImpl.java?rev=3D1551805&r1=3D1551804&r2= =3D155180 > > > 5&view=3Ddiff > > > > > > > > > > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > =3D=3D=3D=3D=3D=3D=3D=3D > > > > > --- > > > > > > > > ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes= / > > > assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakes= I > > > mpl.java > > > > > (original) > > > > > +++ > > > ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctake > > > > > +++ > > > s/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCta > > > > > +++ kesImpl.java Wed Dec 18 02:14:13 2013 > > > > > @@ -32=2C8 +32=2C8 @@ import org.apache.uima.jcas.tcas.Annotat i= mport > > > > > org.mitre.medfacts.i2b2.api.ApiConcept=3B > > > > > import > > > > > org.mitre.medfacts.zoner.CharacterOffsetToLineTokenConverter=3B > > > > > import org.mitre.medfacts.zoner.LineAndTokenPosition=3B > > > > > - > > > > > import org.apache.ctakes.typesystem.type.syntax.BaseToken=3B > > > > > +import org.apache.ctakes.typesystem.type.syntax.NewlineToken=3B > > > > > import org.apache.ctakes.typesystem.type.textspan.Sentence=3B > > > > > > > > > > public class CharacterOffsetToLineTokenConverterCtakesImpl > > > > > implements CharacterOffsetToLineTokenConverter > > > > > @@ -78=2C11 +78=2C13 @@ public class CharacterOffsetToLineTokenC > > > > > for (Annotation current : annotationIndex) > > > > > { > > > > > BaseToken bt =3D (BaseToken)current=3B > > > > > - int begin =3D bt.getBegin()=3B > > > > > - int end =3D bt.getEnd()=3B > > > > > - > > > > > - tokenBeginEndTreeSet.add(begin)=3B > > > > > - tokenBeginEndTreeSet.add(end)=3B > > > > > + // filter out NewlineToken > > > > > + if (!(bt instanceof NewlineToken)) { > > > > > + int begin =3D bt.getBegin()=3B > > > > > + int end =3D bt.getEnd()=3B > > > > > + tokenBeginEndTreeSet.add(begin)=3B > > > > > + tokenBeginEndTreeSet.add(end)=3B > > > > > + } > > > > > } > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > =0A= = --_df16c7f5-3337-46a5-b864-1dc57299d19d_--