ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Kottmann <kottm...@gmail.com>
Subject Re: sentence detector newline behavior
Date Sat, 25 Jan 2014 17:03:40 GMT
On 01/25/2014 01:33 PM, Miller, Timothy wrote:
> Thanks Joern,
> I'll try it. My understanding is I just need to give it my training
> data, with the special character I used replaced with the literal string
> "<LF>" and each line in the file is an example sentence.

Yes, exactly.

> Just thinking about the cTAKES wrapper -- do your changes make it so
> that we wouldn't need to add the special characters (<LF>,<CR>) to a
> document within the cTAKES sentence detector wrapper?

Right, the sentence detector expects the chars as input, not the tags.

For example:
"This is a sentence terminated by a new line\nAnd this is on more sentence."


> It sounds like we
> would need to add <CR> and <LF> to our eosChars value, it's early (for
> my brain) but I wonder whether that could be a default on the opennlp end?

If you pass them in during the training they are stored in the model 
package. All you need to
do is to instantiate the Sentence Detector and it should be ready to use.

BTW, there is also an UIMA integration in opennlp-uima, maybe that could 
work quite well for ctakes.

Jörn



Mime
View raw message