ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From vijay garla <vnga...@gmail.com>
Subject Re: sentence splitter & forks/branches
Date Sun, 19 Jan 2014 19:10:27 GMT
The changes to assertion and dependency parser needed to support multiline
sentences are in the ytex branch.  another pair of eyes and more testing is
always welcome

On Friday, January 17, 2014, digital paula <cybersation@hotmail.com> wrote:

>
>
>
> Hello again cTAKES Community,  I thought that adding the sentence
> splitter(w/newline-sentence-continuation-recognition) would have been as
> simple as it was adding the sectionizer annotator to the eclipse
> environment.  I see per VJ's note that it's not that simple, my
> understanding is that the standard clinical pipeline requires the assertion
> and dependency parsers. I've explored a bit of the changes needed and at
> least for Assertion looks like SentenceDetector, SentenceSpan, likely the
> SingleDocumentProcessor from the MITRE jar will need to be modified to
> recognize multi-line sentences.   This is so the assertion and dependency
> parsers can be kept in the pipeline.  I would love to devote the time
> needed to fix the sentence split to recognize sentences that are multiline
> but I need to focus on hacking my way through the cue word issue because
> I've been left in the lurch with no response to my posts  :-(((((
> Regards,
> Paula
>
> > Date: Wed, 15 Jan 2014 14:53:17 -0500
> > Subject: Re: sentence splitter & forks/branches
> > From: vngarla@gmail.com <javascript:;>
> > To: dev@ctakes.apache.org <javascript:;>
> >
> > It is unfortunately not that trivial, as allowing newlines within
> sentences
> > requires changes to the assertion and dependency parser modules.
> >
> > If you're not using those AEs you could theoretically build the ytex
> > branch, and just add  ctakes-ytex-uima.jar and
> > ctakes-ytex-uima\desc\analysis_engine\SentenceDetectorAnnotator.xml to
> your
> > exsting ctakes install (haven't tried it, but it should work).
> >
> > -vj
> >
> >
> > On Wed, Jan 15, 2014 at 1:57 PM, Lingren, Todd <Todd.Lingren@cchmc.org
> >wrote:
> >
> > > I have a general question about forks, specifically the YTEX branch
> that
> > > Vijay mentions.
> > > If I wanted to implement just the sentence splitter from YTEX into a
> > > currently existing 3.1 install, how would I do that? Is it possible?
> Or do
> > > I have to switch over completely to run from YTEX branch?
> > >
> > > Todd Lingren
> > > Biomedical Informatics
> > > Cincinnati Children's Hospital
> > > Todd.Lingren@cchmc.org
> > > 513-803-9032
> > >
> > >
> > > -----Original Message-----
> > > From: vijay garla [mailto:vngarla@gmail.com]
> > > Sent: Wednesday, January 15, 2014 11:34 AM
> > > To: dev@ctakes.apache.org
> > > Subject: Re: svn commit: r1551805 -
> > >
> /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java
> > >
> > > The issue is indeed the sentence splitter - negation is limited to
> words
> > > within the sentence, and if newlines are considered sentence
> boundaries, it
> > > doesn't work properly (splitting on newlines breaks many other things
> as
> > > well).  The YTEX branch includes a sentence splitter that does not
> > > automatically split sentences on newlines.
> > >
> > > best,
> > >
> > > vj
> > >
> > >
> > > On Wed, Jan 15, 2014 at 10:03 AM, Masanz, James J. <
> Masanz.James@mayo.edu
> > > >wrote:
> > >
> > > > Hi Paula,
> > > >
> > > > The sentence detector in 3.1.0 and 3.1.1 (and previous releases)
> > > > assumes sentences don't cross line boundaries.
> > > > OpenNLP is used to find sentence breaks, but then if newlines are
> > > > found, those are also set (within cTAKES, not OpenNLP) to be sentence
> > > breaks.
> > > >
> > > > (just FYI I haven't had a chance to look at the ytex branch, which
> the
> > > > subject commit is about)
> > > >
> > > > -- James
> > > >
> > > > -----Original Message-----
> > > > From: dev-return-2375-Masanz.James=mayo.edu@ctakes.apache.org[mailto:
> > > > dev-return-2375-Masanz.James=mayo.edu@ctakes.apache.org] On Behalf
> Of
> > > > digital paula
> > > > Sent: Tuesday, January 14, 2014 10:25 PM
> > > > To: dev@ctakes.apache.org
> > > > Subject: RE: svn commit: r1551805 -
> > > >
> /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes
> > > >
> /assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakes
> > > > Impl.java
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Hello cTAKES Developer Community,
> > > >  I'm a little behind on reading posts....this one is from last month.
> > > > I think this issue is already addressed in current release? I'm still
> > > > running the previous release...3.1.0.
> > > > I just noticed something interesting, the negation didn't take when
> it
> > > > is on a different line.  I just removed all carriage returns from
> > > narratives
> > > > and negation picked it up as long as it's treated as one long string.
> > > To
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message