incubator-ctakes-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Bethard (JIRA)" <>
Subject [jira] [Commented] (CTAKES-145) inconsistent handling of upper ascii
Date Tue, 05 Feb 2013 20:13:15 GMT


Steven Bethard commented on CTAKES-145:

Yes please. Anything that is replacing character instead of using the correct encoding is
just a bug waiting to happen later.

Might be worth running the current models over such text just to make sure things don't break
horribly. I wouldn't expect them to, but you never know…


> inconsistent handling of upper ascii 
> -------------------------------------
>                 Key: CTAKES-145
>                 URL:
>             Project: cTAKES
>          Issue Type: Task
>          Components: ctakes-preprocessor
>    Affects Versions: future enhancement
>            Reporter: James Joseph Masanz
>            Priority: Minor
> Currently cTAKES handles character above ascii 127 different depending on if using a
pipeline that processes CDA (Clinical document architecture XML) or a pipeline that expects
plain text.
> The CDA pipelines, as an early step, create a plaintext view that has each upper ascii
characters replaced by a blank.
> The plaintext pipelines do not do anything special for upper ascii characters.
> Example input text for plaintext, to show this behavior: 
> His name is Gërman. Temp is 98 °C taken on the forehead
> Need to decide if it is OK for this inconsistent behavior or if we should change one
or the other to make them consistent.
> See

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message