ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Kottmann <kottm...@gmail.com>
Subject Re: sentence detector newline behavior
Date Mon, 27 Jan 2014 11:10:03 GMT
On 01/26/2014 11:29 PM, Miller, Timothy wrote:
> Yes, this fixes the whitespace sentence issue but the evaluation issue
> remains. I believe the problem is in SentenceSampleStream, where in the
> following block the whitespace trim happens before the <LF> character is
> replaced with the \n character. So test sentences that ended with <LF>
> will be one character longer than they should be.
>
>> >       sentence = sentence.trim();
>> >       sentence = replaceNewLineEscapeTags(sentence);
>> >       sentencesString.append(sentence);
>> >       int end = sentencesString.length();
>> >       sentenceSpans.add(new Span(begin, end));
>> >       sentencesString.append(' ');

Yes, that must be the issue. During training the new line is inlucded in 
the span, and during
detection the white space remover creates a span without the new line char.

I suggest that the evaluator just ignores white space differences 
between sentences. My test case then
has the expected performance numbers.

What do you think?

Anyway, I committed the change. Please give it a try.

Jörn

Mime
View raw message