camel-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Claus Ibsen <claus.ib...@gmail.com>
Subject Re: FileConsumer always reads the data in system charset/encoding
Date Wed, 24 Mar 2010 15:41:00 GMT
Hi

Use
.convertBodyTo(String.class, "utf-8") after the from(file:xxx) to
control the charset used for encoding.



On Wed, Mar 24, 2010 at 4:25 PM, Kevin Jackson <foamdino@gmail.com> wrote:
> Hi,
>
> I have a camel application deployed on RHEL5 with a default
> encoding/locale of UTF-8
>
> I have to download data from a remote Windows server (CP1251 or
> ISO-8859-1/latin-1)
>
> My route breaks down the processing of the files into two steps:
> 1 - download
> 2 - consume and pass split/tokenized String/bytes to POJOs for further
> processing.
>
> My problem stems from the fact that I don't seem to have control over
> the charset that the FileConsumer uses as it converts the file into a
> String.  The data contains encoded chars which are corrupted if the
> data is read as UTF-8 instead of as ISO-8859-1.
>
> I have a simple test case of a file encoded as ISO-8859-1 and I can
> read it with a specific charset and this allows me to process the data
> without corruption.  If I read it as UTF-8, the data is corrupted.
>
> Is there any way I can instruct each of my FileConsumer endpoints to
> consume the file using a specific charset/encoding?  I cannot change
> the locale on the server to fix this as other files must be read as
> UTF-8, not ISO-8859-1
>
> I've looked at the camel source code and the way that camel consumes
> files seems to rely on some kind of type coercion:
>
> in FileBinding:
>    public void loadContent(Exchange exchange, GenericFile<File> file)
> throws IOException {
>        try {
>            content =
> exchange.getContext().getTypeConverter().mandatoryConvertTo(byte[].class,
> file.getFile());
>        } catch (NoTypeConversionAvailableException e) {
>            throw IOHelper.createIOException("Cannot load file
> content: " + file.getAbsoluteFilePath(), e);
>        }
>    }
>
> Is this the code that actually consumes the file and creates the
> message, or should I be looking elsewhere?  I'm trying to add a
> property to GenericFileEndpoint that will allow me to set a parameter
> via the uri :
> file://target/encoded/?charsetEncoding=ISO-8859-1
>
> Thanks,
> Kev
>



-- 
Claus Ibsen
Apache Camel Committer

Author of Camel in Action: http://www.manning.com/ibsen/
Open Source Integration: http://fusesource.com
Blog: http://davsclaus.blogspot.com/
Twitter: http://twitter.com/davsclaus

Mime
View raw message