ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen, Pei" <Pei.C...@childrens.harvard.edu>
Subject RE: sentence detector newline behavior
Date Wed, 29 Jan 2014 21:05:06 GMT
+1
There's an example of the configs here :)
https://issues.apache.org/jira/browse/CTAKES-98

I think we should be able to use OpenNLP's Sentence Annotator directly if we no longer need
the custom newline rule(s) 
[Or if we find that a fixed rule is still required, perhaps OpenNLP can support it via config
as well- there doesn't seem to be anything cTAKES specific about it].
Pending the results of Tim's retraining/evaluation of the new models??

--Pei
> -----Original Message-----
> From: Jörn Kottmann [mailto:kottmann@gmail.com]
> Sent: Wednesday, January 29, 2014 3:55 PM
> To: dev@ctakes.apache.org
> Subject: Re: sentence detector newline behavior
> 
> On 01/27/2014 08:44 PM, Tim Miller wrote:
> >
> > That is a good point, and something I was wondering about. Having now
> > looked at both the ctakes and opennlp code for the sentence splitter
> > it seems like there is a lot of overlap. I would've thought it was
> > just a matter of converting annotations into our type system. So I'm
> > curious if there is some justification for why there seems to be
> > duplication (or if I'm hallucinating it).
> 
> It should be possible (and if not we should make it possible) to directly use
> the opennlp-uima integration. It supports dynamic types which can be
> mapped in the descriptor.
> This would also give you a smooth transition, your existing integration could
> be labeled as deprecated and be removed in one of the future releases.
> 
> Jörn

Mime
View raw message