uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thilo Goetz <twgo...@gmx.de>
Subject Re: InlineXMLCasConsumer fails depending on locale
Date Tue, 21 Feb 2012 16:15:23 GMT
On 21/02/12 16:15, Jens Grivolla wrote:
> On 02/21/2012 04:08 PM, Thilo Goetz wrote:
>> On 21/02/12 15:59, Jens Grivolla wrote:
>>> it appears that InlineXMLCasConsumer depends on the system locale for
>>> some internal transformations. The output appears to be written in UTF8
>>> (outStream.write(xmlAnnotations.getBytes("UTF-8"));) but when used on a
>>> machine with a locale of ASCII all accented characters get broken.
>>>
>>> I suspect that it has to do with the XMLSerializer working on a
>>> ByteArrayOutputStream, but haven't been able to track it down yet.
>>
>> Have you checked that it's really the writing end where things
>> get corrupted, and not the reading end?  Just a thought...
> 
> Yes, we have an XmiWriterCasConsumer in parallel that works fine.
> 
> Jens
> 

Ah yes, eyeballing the source gives:

      // return XML string
      return new String(byteArrayOutputStream.toByteArray());

This is in CasToInlineXml.java.  I stopped after I found this,
maybe there's more.  Jira, patch, you know the drill :-)

--Thilo

Mime
View raw message