camel-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gustav Sinder <gustav.sin...@ferrologic.se>
Subject RE: Wrong charset when using FTP2 component, locale issue?
Date Tue, 22 Sep 2015 10:28:40 GMT
Finally found a solution to this by using the same charset (iso-8859-1) for convertBodyTo as
for the ftp endpoint, i.e:
<convertBodyTo type="java.lang.String" charset="iso-8859-1"/>

Kind regards
/Gustav

-----Original Message-----
From: Gustav Sinder [mailto:gustav.sinder@ferrologic.se] 
Sent: den 2 juli 2015 13:48
To: users@camel.apache.org
Subject: RE: Wrong charset when using FTP2 component, locale issue?

I realize I should probably provide the full picture here:

The context consists of two routes where the first:
-----
<from uri="<my ftp including the binary mode and charset set"> <to uri="direct-vm:another-route-that-returns
nothing?timeout=300000"/>

<!-- Needed for the splitter -->
<convertBodyTo type="java.lang.String"/> <split streaming="true">
	<tokenize token="\n" group="5000"/>
	<wireTap uri="activemq:myQueue"/>
</split>
-----
And second:
-----
<from uri=" activemq:myQueue"/>
<unmarshal>
	<csv delimiter=";"/>
</unmarshal>
<bean ref="transformCSV" method="validateAndTransform"/>
-----

After a lot of troubleshooting it seems that it's the splitter/tokenizer that messes up the
data. It looks correct after the convertBodyTo but doesn't look ok after the tokenizer statement.

Is the tokenizer doing anything here that I should be aware of?

Thanks
/Gustav

-----Original Message-----
From: Gustav Sinder [mailto:gustav.sinder@ferrologic.se]
Sent: den 2 juli 2015 09:57
To: users@camel.apache.org
Subject: Wrong charset when using FTP2 component, locale issue?

Hi,

I've got an issue with files being parsed differently in different environments...specifically
handling Swedish characters.

The ftp endpoint is configured with:

-          charset=iso-8859-1 (matches file format)

-          binary=true

For debug purposes, I'm writing the data (in UTF-8) from a java bean, my local environment
correctly outputs (hex) c3b6 for 'รถ'.
Our test environment outputs (hex) efbfbdefbfbd which is clearly based on erroneously parsed
data.

Since the deployed code/test files is identical, is this an issue with Camel and the underlying
system/locale?
I'm using Apache Camel 2.12.0.redhat-610379 (as part of JBoss Fuse).

My local (Linux) environment uses locale UTF-8:
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=en_US.UTF-8
LC_TIME=en_US.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=en_US.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=en_US.UTF-8
LC_NAME=en_US.UTF-8
LC_ADDRESS=en_US.UTF-8
LC_TELEPHONE=en_US.UTF-8
LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=en_US.UTF-8
LC_ALL=

Our test (Linux) environment  uses POSIX:
LANG=
LANGUAGE=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=

Thanks
/Gustav

Mime
View raw message