ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From vijay garla <vnga...@gmail.com>
Subject Re: sentence splitter & forks/branches
Date Wed, 15 Jan 2014 19:53:17 GMT
It is unfortunately not that trivial, as allowing newlines within sentences
requires changes to the assertion and dependency parser modules.

If you're not using those AEs you could theoretically build the ytex
branch, and just add  ctakes-ytex-uima.jar and
ctakes-ytex-uima\desc\analysis_engine\SentenceDetectorAnnotator.xml to your
exsting ctakes install (haven't tried it, but it should work).

-vj


On Wed, Jan 15, 2014 at 1:57 PM, Lingren, Todd <Todd.Lingren@cchmc.org>wrote:

> I have a general question about forks, specifically the YTEX branch that
> Vijay mentions.
> If I wanted to implement just the sentence splitter from YTEX into a
> currently existing 3.1 install, how would I do that? Is it possible? Or do
> I have to switch over completely to run from YTEX branch?
>
> Todd Lingren
> Biomedical Informatics
> Cincinnati Children's Hospital
> Todd.Lingren@cchmc.org
> 513-803-9032
>
>
> -----Original Message-----
> From: vijay garla [mailto:vngarla@gmail.com]
> Sent: Wednesday, January 15, 2014 11:34 AM
> To: dev@ctakes.apache.org
> Subject: Re: svn commit: r1551805 -
> /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java
>
> The issue is indeed the sentence splitter - negation is limited to words
> within the sentence, and if newlines are considered sentence boundaries, it
> doesn't work properly (splitting on newlines breaks many other things as
> well).  The YTEX branch includes a sentence splitter that does not
> automatically split sentences on newlines.
>
> best,
>
> vj
>
>
> On Wed, Jan 15, 2014 at 10:03 AM, Masanz, James J. <Masanz.James@mayo.edu
> >wrote:
>
> > Hi Paula,
> >
> > The sentence detector in 3.1.0 and 3.1.1 (and previous releases)
> > assumes sentences don't cross line boundaries.
> > OpenNLP is used to find sentence breaks, but then if newlines are
> > found, those are also set (within cTAKES, not OpenNLP) to be sentence
> breaks.
> >
> > (just FYI I haven't had a chance to look at the ytex branch, which the
> > subject commit is about)
> >
> > -- James
> >
> > -----Original Message-----
> > From: dev-return-2375-Masanz.James=mayo.edu@ctakes.apache.org [mailto:
> > dev-return-2375-Masanz.James=mayo.edu@ctakes.apache.org] On Behalf Of
> > digital paula
> > Sent: Tuesday, January 14, 2014 10:25 PM
> > To: dev@ctakes.apache.org
> > Subject: RE: svn commit: r1551805 -
> > /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes
> > /assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakes
> > Impl.java
> >
> >
> >
> >
> >
> >
> >
> > Hello cTAKES Developer Community,
> >  I'm a little behind on reading posts....this one is from last month.
> > I think this issue is already addressed in current release? I'm still
> > running the previous release...3.1.0.
> > I just noticed something interesting, the negation didn't take when it
> > is on a different line.  I just removed all carriage returns from
> narratives
> > and negation picked it up as long as it's treated as one long string.
> To
> > better explain what I mean.  Two narrative comments below.
> >
> > 1.  patient did not have diabetes
> > 2. patient did not have
> > diabetes
> >
> > Number 1 above got negated but number 2 did not. This might be related
> > to the issue w/the sectionizer.  I noticed that when I treated the
> narrative
> > as one string the sectionizer never crashes with the NPE.   Well the
> > sectionizer is of no point if narrative is as one string but it's
> > helping me pinpoint the problem.
> >
> > Regards,
> > Paula
> >
> >
> > > Date: Thu, 19 Dec 2013 11:04:57 -0500
> > > Subject: Re: FW: svn commit: r1551805 -
> > /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes
> > /assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakes
> > Impl.java
> > > From: vngarla@gmail.com
> > > To: dev@ctakes.apache.org
> > >
> > > Hi Pei,
> > >
> > > I'm not sure if that would solve the problem: change in the ytex
> > > branch causes newlines to be ignored (i.e. not treated as a token).
> > > trunk's sentence splitter is splits sentences on newlines, so
> > > newlines would
> > never
> > > be found in a sentence.  However, if we had a reproducer we could
> > > check
> > it
> > > fairly easily in the ytex branch.
> > >
> > > Best,
> > >
> > > VJ
> > >
> > >
> > > On Thu, Dec 19, 2013 at 10:15 AM, Chen, Pei
> > > <Pei.Chen@childrens.harvard.edu>wrote:
> > >
> > > > Vj,
> > > > Do you think this is what was causing the NPE's [1]?
> > > > If so, shall we make the same fix in trunk?
> > > > --Pei
> > > >
> > > > [1]
> > > >
> > http://mail-archives.apache.org/mod_mbox/ctakes-dev/201309.mbox/%3C924
> > DE05C19409B438EB81DE683A942D9105A93CB%40CHEXMBX1A.CHBOSTON.ORG%3E
> > > >
> > > > -----Original Message-----
> > > > From: vjapache@apache.org [mailto:vjapache@apache.org]
> > > > Sent: Tuesday, December 17, 2013 9:15 PM
> > > > To: commits@ctakes.apache.org
> > > > Subject: svn commit: r1551805 -
> > > >
> > /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes
> > /assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakes
> > Impl.java
> > > >
> > > > Author: vjapache
> > > > Date: Wed Dec 18 02:14:13 2013
> > > > New Revision: 1551805
> > > >
> > > > URL: http://svn.apache.org/r1551805
> > > > Log:
> > > > add support for sentences that contain newline tokens.
> > > >
> > > > Modified:
> > > >
> > > >
> > ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/
> > assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesI
> > mpl.java
> > > >
> > > > Modified:
> > > >
> > ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/
> > assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesI
> > mpl.java
> > > > URL:
> > > >
> > http://svn.apache.org/viewvc/ctakes/branches/ytex/ctakes-assertion/src
> > /main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffs
> > etToLineTokenConverterCtakesImpl.java?rev=1551805&r1=1551804&r2=155180
> > 5&view=diff
> > > >
> > > >
> > ======================================================================
> > ========
> > > > ---
> > > >
> > ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/
> > assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesI
> > mpl.java
> > > > (original)
> > > > +++
> > ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctake
> > > > +++
> > s/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCta
> > > > +++ kesImpl.java Wed Dec 18 02:14:13 2013
> > > > @@ -32,8 +32,8 @@ import org.apache.uima.jcas.tcas.Annotat  import
> > > > org.mitre.medfacts.i2b2.api.ApiConcept;
> > > >  import
> > > > org.mitre.medfacts.zoner.CharacterOffsetToLineTokenConverter;
> > > >  import org.mitre.medfacts.zoner.LineAndTokenPosition;
> > > > -
> > > >  import org.apache.ctakes.typesystem.type.syntax.BaseToken;
> > > > +import org.apache.ctakes.typesystem.type.syntax.NewlineToken;
> > > >  import org.apache.ctakes.typesystem.type.textspan.Sentence;
> > > >
> > > >  public class CharacterOffsetToLineTokenConverterCtakesImpl
> > > > implements CharacterOffsetToLineTokenConverter
> > > > @@ -78,11 +78,13 @@ public class CharacterOffsetToLineTokenC
> > > >           for (Annotation current : annotationIndex)
> > > >           {
> > > >                   BaseToken bt = (BaseToken)current;
> > > > -                 int begin = bt.getBegin();
> > > > -                 int end = bt.getEnd();
> > > > -
> > > > -                 tokenBeginEndTreeSet.add(begin);
> > > > -                 tokenBeginEndTreeSet.add(end);
> > > > +                 // filter out NewlineToken
> > > > +                 if (!(bt instanceof NewlineToken)) {
> > > > +                         int begin = bt.getBegin();
> > > > +                         int end = bt.getEnd();
> > > > +                         tokenBeginEndTreeSet.add(begin);
> > > > +                         tokenBeginEndTreeSet.add(end);
> > > > +                 }
> > > >           }
> > > >    }
> > > >
> > > >
> > > >
> > > >
> >
> >
> >
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message