uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jens Grivolla <j+...@grivolla.net>
Subject Re: InlineXMLCasConsumer fails depending on locale
Date Tue, 21 Feb 2012 17:46:23 GMT
On 02/21/2012 05:15 PM, Thilo Goetz wrote:
> On 21/02/12 16:15, Jens Grivolla wrote:
>> On 02/21/2012 04:08 PM, Thilo Goetz wrote:
>>> On 21/02/12 15:59, Jens Grivolla wrote:
>>>> it appears that InlineXMLCasConsumer depends on the system locale for
>>>> some internal transformations. The output appears to be written in UTF8
>>>> (outStream.write(xmlAnnotations.getBytes("UTF-8"));) but when used on a
>>>> machine with a locale of ASCII all accented characters get broken.
>>>>
>>>> I suspect that it has to do with the XMLSerializer working on a
>>>> ByteArrayOutputStream, but haven't been able to track it down yet.
>>>
>>> Have you checked that it's really the writing end where things
>>> get corrupted, and not the reading end?  Just a thought...
>>
>> Yes, we have an XmiWriterCasConsumer in parallel that works fine.
>
> Ah yes, eyeballing the source gives:
>
>        // return XML string
>        return new String(byteArrayOutputStream.toByteArray());
>
> This is in CasToInlineXml.java.  I stopped after I found this,
> maybe there's more.  Jira, patch, you know the drill :-)

https://issues.apache.org/jira/browse/UIMA-2376


Mime
View raw message