ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From digital paula <cybersat...@hotmail.com>
Subject RE: sentence splitter & forks/branches
Date Mon, 20 Jan 2014 06:56:14 GMT



VJ, I misunderstood what you had wrote previously.   Okay, from your email last month here's
the link I found for importing into eclipse as new instance. https://svn.apache.org/repos/asf/ctakes/branches/ytex.
  
 I have a pressing deadline for this week but will return, glad you got the sentence splitter
multi-line issue resolved I will test out and also import the sectionizer.   
Regards,
Paula
 
> Date: Sun, 19 Jan 2014 14:10:27 -0500
> Subject: Re: sentence splitter & forks/branches
> From: vngarla@gmail.com
> To: dev@ctakes.apache.org
> 
> The changes to assertion and dependency parser needed to support multiline
> sentences are in the ytex branch.  another pair of eyes and more testing is
> always welcome
> 
> On Friday, January 17, 2014, digital paula <cybersation@hotmail.com> wrote:
> 
> >
> >
> >
> > Hello again cTAKES Community,  I thought that adding the sentence
> > splitter(w/newline-sentence-continuation-recognition) would have been as
> > simple as it was adding the sectionizer annotator to the eclipse
> > environment.  I see per VJ's note that it's not that simple, my
> > understanding is that the standard clinical pipeline requires the assertion
> > and dependency parsers. I've explored a bit of the changes needed and at
> > least for Assertion looks like SentenceDetector, SentenceSpan, likely the
> > SingleDocumentProcessor from the MITRE jar will need to be modified to
> > recognize multi-line sentences.   This is so the assertion and dependency
> > parsers can be kept in the pipeline.  I would love to devote the time
> > needed to fix the sentence split to recognize sentences that are multiline
> > but I need to focus on hacking my way through the cue word issue because
> > I've been left in the lurch with no response to my posts  :-(((((
> > Regards,
> > Paula
> >
> > > Date: Wed, 15 Jan 2014 14:53:17 -0500
> > > Subject: Re: sentence splitter & forks/branches
> > > From: vngarla@gmail.com <javascript:;>
> > > To: dev@ctakes.apache.org <javascript:;>
> > >
> > > It is unfortunately not that trivial, as allowing newlines within
> > sentences
> > > requires changes to the assertion and dependency parser modules.
> > >
> > > If you're not using those AEs you could theoretically build the ytex
> > > branch, and just add  ctakes-ytex-uima.jar and
> > > ctakes-ytex-uima\desc\analysis_engine\SentenceDetectorAnnotator.xml to
> > your
> > > exsting ctakes install (haven't tried it, but it should work).
> > >
> > > -vj
> > >
> > >
> > > On Wed, Jan 15, 2014 at 1:57 PM, Lingren, Todd <Todd.Lingren@cchmc.org
> > >wrote:
> > >
> > > > I have a general question about forks, specifically the YTEX branch
> > that
> > > > Vijay mentions.
> > > > If I wanted to implement just the sentence splitter from YTEX into a
> > > > currently existing 3.1 install, how would I do that? Is it possible?
> > Or do
> > > > I have to switch over completely to run from YTEX branch?
> > > >
> > > > Todd Lingren
> > > > Biomedical Informatics
> > > > Cincinnati Children's Hospital
> > > > Todd.Lingren@cchmc.org
> > > > 513-803-9032
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: vijay garla [mailto:vngarla@gmail.com]
> > > > Sent: Wednesday, January 15, 2014 11:34 AM
> > > > To: dev@ctakes.apache.org
> > > > Subject: Re: svn commit: r1551805 -
> > > >
> > /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java
> > > >
> > > > The issue is indeed the sentence splitter - negation is limited to
> > words
> > > > within the sentence, and if newlines are considered sentence
> > boundaries, it
> > > > doesn't work properly (splitting on newlines breaks many other things
> > as
> > > > well).  The YTEX branch includes a sentence splitter that does not
> > > > automatically split sentences on newlines.
> > > >
> > > > best,
> > > >
> > > > vj
> > > >
> > > >
> > > > On Wed, Jan 15, 2014 at 10:03 AM, Masanz, James J. <
> > Masanz.James@mayo.edu
> > > > >wrote:
> > > >
> > > > > Hi Paula,
> > > > >
> > > > > The sentence detector in 3.1.0 and 3.1.1 (and previous releases)
> > > > > assumes sentences don't cross line boundaries.
> > > > > OpenNLP is used to find sentence breaks, but then if newlines are
> > > > > found, those are also set (within cTAKES, not OpenNLP) to be sentence
> > > > breaks.
> > > > >
> > > > > (just FYI I haven't had a chance to look at the ytex branch, which
> > the
> > > > > subject commit is about)
> > > > >
> > > > > -- James
> > > > >
> > > > > -----Original Message-----
> > > > > From: dev-return-2375-Masanz.James=mayo.edu@ctakes.apache.org[mailto:
> > > > > dev-return-2375-Masanz.James=mayo.edu@ctakes.apache.org] On Behalf
> > Of
> > > > > digital paula
> > > > > Sent: Tuesday, January 14, 2014 10:25 PM
> > > > > To: dev@ctakes.apache.org
> > > > > Subject: RE: svn commit: r1551805 -
> > > > >
> > /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes
> > > > >
> > /assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakes
> > > > > Impl.java
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Hello cTAKES Developer Community,
> > > > >  I'm a little behind on reading posts....this one is from last month.
> > > > > I think this issue is already addressed in current release? I'm still
> > > > > running the previous release...3.1.0.
> > > > > I just noticed something interesting, the negation didn't take when
> > it
> > > > > is on a different line.  I just removed all carriage returns from
> > > > narratives
> > > > > and negation picked it up as long as it's treated as one long string.
> > > > To
> >

 		 	   		  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message