ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Masanz, James J." <Masanz.Ja...@mayo.edu>
Subject RE: apostrophe and sentence detector
Date Mon, 26 Aug 2013 16:29:16 GMT
The training data is one sentence per line.
That's how you feed data to the sentence detector.

-----Original Message-----
From: dev-return-1884-Masanz.James=mayo.edu@ctakes.apache.org [mailto:dev-return-1884-Masanz.James=mayo.edu@ctakes.apache.org]
On Behalf Of Tim Miller
Sent: Monday, August 26, 2013 11:12 AM
To: dev@ctakes.apache.org
Subject: Re: apostrophe and sentence detector

On 08/26/2013 12:05 PM, Masanz, James J. wrote:
> The recently rebuilt sentence detector (currently in trunk and the 3.1.0 branch) is sometimes
taking the apostrophe as a sentence break where the ctakes-3.0.0-incubating model didn't.
> The training data used for the recently rebuilt model only contains only 7 lines that
end with an apostrophe (single quote)
Do you mean 7 sentences that end in a single apostrophe or 7 lines? The 
sentence detector will currently break on newlines no matter what, so 
the important number is how many sentences end mid-line with an 
apostrophe, right?

View raw message