ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Britt Fitch <britt.fi...@gmail.com>
Subject Re: question about sentence segmentation
Date Mon, 14 Jul 2014 12:29:19 GMT
My preference is to treat the list row number as outside of the sentence of
interest.
Or if it is necessary to be included in a sentence, have it be a sentence
on its own.
That won't be as straightforward as splitting on a period in cases
like "2. Magnesium
oxide 400 mg p.o. daily."
In cases where there are more than 1 written sentence like your example in
the original email, I'd prefer those were each a sentence rather than
making the entire list line a single sentence.
My feeling is that each line without terminating punctuation would be a
single sentence and would exclude the list number.

As an aside, I have encountered several issues with numbered lists being
interpreted differently depending on
1. what number is included at the start
for example: "2. Magnesium oxide 400 mg p.o. daily." vs "12. Magnesium
oxide 400 mg p.o. daily." (This appears to be a chunking issue where the
line starting with "12. Magnesium" is identified as starting with chunks [O,
O, B-NP, B-NP, I-NP, B-NP, B-ADVP, O] even though the parts of speech
appear to be correct)
2. whether there is a period at the end of a list
for example: "4. CHF" vs "4. CHF." (This appears to be an issue with the
chunker though which produces [O,O] in the first case and [B-VP, B-NP, O]
in the second.

Cheers,

Britt



On Mon, Jul 14, 2014 at 7:50 AM, Miller, Timothy <
Timothy.Miller@childrens.harvard.edu> wrote:

> Just curious about an edge case regarding headers/lists and wondering what
> people think the correct behavior and annotation are.
>
> In cases like this:
>
> #1 Dilated esophagus.
> #2 Adenocarcinoma
>
> my intuition is that each whole line is one sentence. But then there are
> cases where the number may be followed by multiple sentences on one line.
> 1. EGD as a complex procedure. If there is an abnormality, obtain biopsies.
>
> For this example my intuition is not as clear. Should there be a break
> after the "1." or should the first sentence be "1. EGD as a complex
> procedure."? Again, my intuition leans towards the latter but it seems a
> bit odd since the "1." kind of distributes over all the following sentences
> (i.e. it's like a paragraph descriptor.)
>
> Does the period after the 1 matter? The number of sentences after the list
> header? The fact that it's all on one line? Anything else?
>
> Tim
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message