ctakes-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CTAKES-227) Broca's -> PunctuationToken instead of ContractionToken - caused by apostrophe seen as sentence ending
Date Mon, 26 Aug 2013 18:51:54 GMT

    [ https://issues.apache.org/jira/browse/CTAKES-227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13750399#comment-13750399

ASF subversion and git services commented on CTAKES-227:

Commit 1517639 from james-masanz@apache.org in branch 'ctakes/trunk'
[ https://svn.apache.org/r1517639 ]

CTAKES-227 - don't have apostrophe in list of potential sentence endings due to too many false
> Broca's -> PunctuationToken instead of ContractionToken - caused by apostrophe seen
as sentence ending
> ------------------------------------------------------------------------------------------------------
>                 Key: CTAKES-227
>                 URL: https://issues.apache.org/jira/browse/CTAKES-227
>             Project: cTAKES
>          Issue Type: Bug
>          Components: ctakes-core
>    Affects Versions: 3.1
>            Reporter: James Joseph Masanz
>            Assignee: James Joseph Masanz
> The recently rebuilt sentence detector (currently in trunk and the 3.1.0 branch) is sometimes
taking the apostrophe as a sentence break where the ctakes-3.0.0-incubating model didn’t.
> The training data used for the recently rebuilt model only contains only 7 lines that
end with an apostrophe (single quote) followed immediately by a newline
> It has >100K occurrences of 's
> It has >175K occurrences of the ' character in all.
> The place I noticed this is in testfakenote.txt.xml in ctakes-regression-test.
> The word "Broca's" used to have a ContractionToken but since a sentence is now ending
on the apostrophe, the apostrophe is getting annotated as a PunctuationToken.
> See more in the thread started at
> http://markmail.org/message/wavipejszlspzo5u
> including examples that split correctly and incorrectly.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message