camel-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Burkard Stephan <>
Subject AW: Charset on file poller endpoint
Date Tue, 09 May 2017 14:28:39 GMT
Just to document it for others with the same problem. When the body is passed as byte array,
the bytes are correct.

    public String detectEncodingByBom(@Body byte[] body) {
        byte[] firstThreeBytes = Arrays.copyOfRange(body, 0, 3);
        log.debug("3 Bytes as Hex: " + Hex.encodeHexString(firstThreeBytes));

The log output for a UTF-16LE file is "fffe3c". This is the correct BOM (FFFE) and the first
byte of the first character "<".


-----Urspr√ľngliche Nachricht-----
Von: Burkard Stephan 
Gesendet: Donnerstag, 4. Mai 2017 16:08
An: ''
Betreff: AW: Charset on file poller endpoint

Yes, a Bean is probably the best way to do the work. 

However, I tried to inject the exchange, get the body as InputStream and read the first 4
bytes from the body (because an InputStream is a byte representation and therefore not encoded).
When I read a file that is UTF-16 (Big endian) encoded, I get the output "Hex: efbfbdef"

    public void determineEncoding(Exchange exchange) throws Exception {
        InputStream is = exchange.getIn().getBody(InputStream.class);
        DataInputStream dis = new DataInputStream(is);
        int fourBytes = dis.readInt();
        String hex = Integer.toHexString(fourBytes);"Hex: " + hex);

But when I read the file directly, I get the output "Hex: feff003c"

    public void testUtf16BeBom() throws Exception {
        InputStream utf16FileStream = this.getClass().getClassLoader().getResourceAsStream("testfiles/XmlUtf16Be.xml");
        DataInputStream dis = new DataInputStream(utf16FileStream);
        int fourBytes = dis.readInt();
        String hex = Integer.toHexString(fourBytes);"Hex: " + hex);

The output of the direct read is correct since "feff" is the UTF-16 BE BOM, followed by "003c"
which is the first character "<" in a 2-byte representation. 

Any idea why the output through the Camel route/Bean is wrong? Is it because the body has
already be encoded (with a wrong encoding)?


-----Urspr√ľngliche Nachricht-----
Von: souciance []
Gesendet: Donnerstag, 4. Mai 2017 12:13
Betreff: Re: Charset on file poller endpoint

Probably the easiest is to read the file and send the exchange to a bean.
In the bean try to read it and determine the encoding and if it has a BOM character. Finally
do your conversion and put the body back to the exchange.

.to(DetermineEncoding.class, "determineEncoding")

On Thu, May 4, 2017 at 12:01 PM, Burkard Stephan [via Camel] <> wrote:

> Hi Camel users
> I read files with a Camel file poller and they can have different 
> encodings (UTF-8 with or without BOM, UTF-16). Therefore I would like 
> to determine the given encoding and convert the message body to UTF-8 
> without BOM for the further processing.
> How can I do this and what is exactly the result in the message 
> payload in the exchange? Is it payload an inputstream (just bytes, no
> encoding) or is it already converted to a string or a reader (already encoded).
> And what does the "charset" option change? Does it overwrite the 
> default encoding of the operating system?
> from(file:/myDir)
> // can I read here the first bytes of the file?
> .to(activemq:queue:myQueue)
> Thanks for any hints
> Stephan
> ------------------------------
> If you reply to this email, your message will be added to the 
> discussion
> below:
> endpoint-tp5798625.html
> To start a new topic under Camel - Users, email ml+s465427n465428h31@n5.
> To unsubscribe from Camel - Users, click here 
> <
> bscribe_by_code&node=465428&code=c291Y2lhbmNlLmVxZGFtLnJhc2h0aUBnbWFpb
> C5jb218NDY1NDI4fDE1MzI5MTE2NTY=>
> .
> <
> o_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namesp
> ew.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%
> 3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%2
> 1nabble%3Aemail.naml>

View this message in context:
Sent from the Camel - Users mailing list archive at

View raw message