ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miller, Timothy" <Timothy.Mil...@childrens.harvard.edu>
Subject Re: sentence detector newline behavior
Date Sun, 26 Jan 2014 22:29:18 GMT

On 01/26/2014 09:59 AM, Jörn Kottmann wrote:
>
> The evaluation should ignore white spaces. I committed now my fix, it 
> would be nice if you can
> test it.
>
> There might be still something wrong. In my test data I replaced all 
> question marks with white spaces, and the result
> is slightly worse than with the original data.
>
> Jörn
Yes, this fixes the whitespace sentence issue but the evaluation issue
remains. I believe the problem is in SentenceSampleStream, where in the
following block the whitespace trim happens before the <LF> character is
replaced with the \n character. So test sentences that ended with <LF>
will be one character longer than they should be.

>       sentence = sentence.trim();
>       sentence = replaceNewLineEscapeTags(sentence);
>       sentencesString.append(sentence);
>       int end = sentencesString.length();
>       sentenceSpans.add(new Span(begin, end));
>       sentencesString.append(' ');


Mime
View raw message