incubator-ctakes-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Joseph Masanz (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CTAKES-145) inconsistent handling of upper ascii
Date Tue, 05 Feb 2013 16:36:12 GMT
James Joseph Masanz created CTAKES-145:
------------------------------------------

             Summary: inconsistent handling of upper ascii 
                 Key: CTAKES-145
                 URL: https://issues.apache.org/jira/browse/CTAKES-145
             Project: cTAKES
          Issue Type: Task
          Components: ctakes-preprocessor
    Affects Versions: future enhancement
            Reporter: James Joseph Masanz
            Priority: Minor


Currently cTAKES handles character above ascii 127 different depending on if using a pipeline
that processes CDA (Clinical document architecture XML) or a pipeline that expects plain text.

The CDA pipelines, as an early step, create a plaintext view that has each upper ascii characters
replaced by a blank.

The plaintext pipelines do not do anything special for upper ascii characters.

Example input text for plaintext, to show this behavior: 
His name is Gërman. Temp is 98 °C taken on the forehead

Need to decide if it is OK for this inconsistent behavior or if we should change one or the
other to make them consistent.

See ClinicalNotePreProcessor.java


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message