ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miller, Timothy" <Timothy.Mil...@childrens.harvard.edu>
Subject question about sentence segmentation
Date Mon, 14 Jul 2014 11:50:33 GMT
Just curious about an edge case regarding headers/lists and wondering what people think the
correct behavior and annotation are.

In cases like this:

#1 Dilated esophagus.
#2 Adenocarcinoma

my intuition is that each whole line is one sentence. But then there are cases where the number
may be followed by multiple sentences on one line.
1. EGD as a complex procedure. If there is an abnormality, obtain biopsies.

For this example my intuition is not as clear. Should there be a break after the "1." or should
the first sentence be "1. EGD as a complex procedure."? Again, my intuition leans towards
the latter but it seems a bit odd since the "1." kind of distributes over all the following
sentences (i.e. it's like a paragraph descriptor.)

Does the period after the 1 matter? The number of sentences after the list header? The fact
that it's all on one line? Anything else?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message