camel-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Claus Ibsen <claus.ib...@gmail.com>
Subject Re: A possible bug in IOConverter with Win-1251 charset
Date Wed, 09 Mar 2016 10:11:47 GMT
Hi

Yeah would be good if you can try the suggestions from Antoine. And if
you can reproduce an unit test and possible provide a fix in a PR /
patch. We love contributions
http://camel.apache.org/contributing

On Tue, Mar 8, 2016 at 12:53 AM, Antoine Toulme <antoine@toulme.name> wrote:
> What happens is that your default charset is win-1251 while the file is UTF-8.
>
> The file is read correctly according to the charset argument passed to the toInputStream
method ; however, the default charset used to parse and send the stream is the default charset.
>
> The immediate workaround for you is to add an explicit charset when launching the JVM:
-Dfile.encoding=UTF-8
>
> I would recommend you go ahead, file a bug and add a simple test case in IOConverterTest
around line 83.
>
>> On Mar 5, 2016, at 11:05 PM, fedd <feddkraft@hotmail.com> wrote:
>>
>> I made an experiment and saw that the situation is much worse that just
>> losing one frequent Russian letter.
>>
>> I made a UTF-8 file with both Russian text and one German A Umlaut letter,
>> and Camel was unable to read a German letter replacing it with a question
>> mark, just because my windows dev machine native charset happened to be
>> win-1251.
>>
>> I don't really think it's okay
>>
>> 1) to ever flatten Unicode strings to a single byte character set;
>>
>> 2) when the behaviour of the server side code depends on the host operating
>> system settings (becomes not portable)
>>
>> May I file a Jira bug report?
>>
>> Here's by route:
>>
>>        <dataFormats>
>>            <json id="jack" library="Jackson" prettyPrint="true"/>
>>        </dataFormats>
>>
>>        <route>
>>
>>            <from
>> uri="file:///C:/tries/collApp/exchange/in?fileName=registerSampleUtf.csv&amp;charset=UTF-8"/>
>>            <log message="file: ${body.class.name} ${body}"
>> loggingLevel="WARN"/>
>>            <unmarshal>
>>                <csv delimiter=";"  useMaps="true" />
>>            </unmarshal>
>>            <log message="unmarshalled: ${body.class.name} ${body}"
>> loggingLevel="WARN"/>
>>            <marshal ref="jack"/>
>>            <log message="marshalled: ${body}" loggingLevel="WARN"/>
>>            <to
>> uri="file:///C:/tries/collApp/exchange/out?fileName=out.json"/>
>>        </route>
>>
>> At the first "log" only a German letter is replaced with the question mark.
>>
>> At the second, all Russian letters are replaced with the question marks.
>>
>> The resulting JSON can't even display the question marks when read in any of
>> the world's encodings.
>>
>> Shall I provide a test CSV file here? (warning: it contains Russian letters)
>>
>>
>>
>> --
>> View this message in context: http://camel.465427.n5.nabble.com/A-possible-bug-in-IOConverter-with-Win-1251-charset-tp5778665p5778666.html
>> Sent from the Camel Development mailing list archive at Nabble.com.
>



-- 
Claus Ibsen
-----------------
http://davsclaus.com @davsclaus
Camel in Action 2: https://www.manning.com/ibsen2

Mime
View raw message