ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Masanz, James J." <Masanz.Ja...@mayo.edu>
Subject RE: sentence detector newline behavior
Date Tue, 21 May 2013 14:11:02 GMT

+1 for adding a boolean parameter, or perhaps instead a list of section IDs 

The sentence detector model was trained on data that always breaks at carriage returns.

It is important for text that is a list something like this:

Heart Rate: normal
ENT: negative
EXTRAVASCULAR FINDINGS: Severe prostatic enlargement.

And without breaking on the line ending, the word negative would negate extravascular findings


-----Original Message-----
From: dev-return-1605-Masanz.James=mayo.edu@ctakes.apache.org [mailto:dev-return-1605-Masanz.James=mayo.edu@ctakes.apache.org]
On Behalf Of Miller, Timothy
Sent: Tuesday, May 21, 2013 7:07 AM
To: dev@ctakes.apache.org
Subject: sentence detector newline behavior

The sentence detector always ends a sentence where there are newlines.
This is a problem for some notes (e.g. MIMIC radiology notes) where a
line can wrap in the  middle of a sentence at specified character
offsets. In the comments for SentenceDetector, it seems to be split up
very logically in that it first runs the opennlp sentence detector, then
breaks any detected sentence wherever there is a newline. Questions:
1) Would it be good to add a boolean parameter for breaking on newlines?
2) If that section was removed/avoided, does the opennlp sentence
detector give good results given our model? Or is the model trained on
text that always breaks at carriage returns?

Tim

Mime
View raw message