incubator-ctakes-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andy McMurry (JIRA)" <>
Subject [jira] [Commented] (CTAKES-145) inconsistent handling of upper ascii
Date Tue, 05 Feb 2013 20:17:15 GMT


Andy McMurry commented on CTAKES-145:

In a sense, cTAKES supports only the formats that it was trained with. 
I suggest alerting the user if a non-supported format is detected. 

Its a hard problem to wrap your head around as an end user. 
All the more difficult when the user is not even aware of the non-standard character translation

Ideally, cTAKES would work as normal when the same format is in use as the training set. 
If an unsupported format is detected, log warnings (once per document) with a  pointer to
how to "correct' the issue. 


On Feb 5, 2013, at 12:03 PM, "Masanz, James J." <> wrote:

> inconsistent handling of upper ascii 
> -------------------------------------
>                 Key: CTAKES-145
>                 URL:
>             Project: cTAKES
>          Issue Type: Task
>          Components: ctakes-preprocessor
>    Affects Versions: future enhancement
>            Reporter: James Joseph Masanz
>            Priority: Minor
> Currently cTAKES handles character above ascii 127 different depending on if using a
pipeline that processes CDA (Clinical document architecture XML) or a pipeline that expects
plain text.
> The CDA pipelines, as an early step, create a plaintext view that has each upper ascii
characters replaced by a blank.
> The plaintext pipelines do not do anything special for upper ascii characters.
> Example input text for plaintext, to show this behavior: 
> His name is Gërman. Temp is 98 °C taken on the forehead
> Need to decide if it is OK for this inconsistent behavior or if we should change one
or the other to make them consistent.
> See

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message