ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miller, Timothy" <Timothy.Mil...@childrens.harvard.edu>
Subject Re: sentence detector newline behavior
Date Wed, 22 May 2013 11:17:56 GMT
That's awesome! It might be worth trying at least. How does the training
process change? Previously the training data would be one sentence per
line, but with newlines as possible mid-sentence characters that could
be trouble, is there a new representation for training data? Or would we
have to use the training api?
Tim

On 05/22/2013 05:20 AM, Jörn Kottmann wrote:
> On 05/21/2013 08:00 PM, Steven Bethard wrote:
>> So perhaps we could re-train it to disambiguate newline characters as well?
>>
> Yes, the OpenNLP Sentence Detector now supports that in the new 1.5.3 
> version out of the box, you can
> specify the set of EOS chars to use, but the default is still: !?. If 
> you have special needs you can also customize
> the feature generation. It should probably be possible to drop the 
> cTAKES eos fix for that now.
>
> Let me know if you have any question or need some help to customize it 
> for cTAKES.
>
> Jörn
>


Mime
View raw message