camel-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Strachan" <james.strac...@gmail.com>
Subject Re: XMLConverter and default charset
Date Thu, 06 Mar 2008 12:17:54 GMT
On 06/03/2008, Arjan Moraal <nabble@ajmoraal.fastmail.net> wrote:
>
>  The org.apache.camel.converter.jaxp.XMLConverter class has a method to
>  convert a String to a DOM Document. This method is automatically called when
>  for instance an XPath expression is run on a TextMessage received from the
>  JMS.

I guess when sending non UTF encoded XML then folks should send a
BytesMessage instead?


>
>     @Converter
>     public Document toDOMDocument(String text) throws IOException,
>  SAXException, ParserConfigurationException {
>         return toDOMDocument(text.getBytes());
>     }
>
>  The problem with this is that the String is converted to a byte[] using the
>  default character encoding of the platform (in my case CP-1252 on
>  WindowsXP). But the XML in the text message might have a different encoding
>  attribute in the header (<?xml version="1.0" encoding="UTF-8"?>), which can
>  cause SAXParser exceptions (Like: Invalid byte 1 of 1-byte UTF-8 sequence).
>
>  So shouldn't this toDOMDocument() method use either the encoding defined in
>  the XML to convert the String to byte[]?
>  Or change the encoding attribute in the XML header to the character encoding
>  used to generate the byte[]?

Great catch!

I've modified the code so that we parse the XML in the String using a
StringReader/InputSource instead to avoid converting to/from bytes. Do
you think that should help?


-- 
James
-------
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com

Mime
View raw message