incubator-ctakes-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Bethard (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CTAKES-145) inconsistent handling of upper ascii
Date Tue, 05 Feb 2013 20:13:15 GMT

    [ https://issues.apache.org/jira/browse/CTAKES-145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13571687#comment-13571687
] 

Steven Bethard commented on CTAKES-145:
---------------------------------------


Yes please. Anything that is replacing character instead of using the correct encoding is
just a bug waiting to happen later.


Might be worth running the current models over such text just to make sure things don't break
horribly. I wouldn't expect them to, but you never know…

Steve

                
> inconsistent handling of upper ascii 
> -------------------------------------
>
>                 Key: CTAKES-145
>                 URL: https://issues.apache.org/jira/browse/CTAKES-145
>             Project: cTAKES
>          Issue Type: Task
>          Components: ctakes-preprocessor
>    Affects Versions: future enhancement
>            Reporter: James Joseph Masanz
>            Priority: Minor
>
> Currently cTAKES handles character above ascii 127 different depending on if using a
pipeline that processes CDA (Clinical document architecture XML) or a pipeline that expects
plain text.
> The CDA pipelines, as an early step, create a plaintext view that has each upper ascii
characters replaced by a blank.
> The plaintext pipelines do not do anything special for upper ascii characters.
> Example input text for plaintext, to show this behavior: 
> His name is Gërman. Temp is 98 °C taken on the forehead
> Need to decide if it is OK for this inconsistent behavior or if we should change one
or the other to make them consistent.
> See ClinicalNotePreProcessor.java

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message