camel-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Half (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CAMEL-11846) xtokenize and apply xslt to a string does not work with UTF-16BE
Date Thu, 09 Nov 2017 12:39:00 GMT

    [ https://issues.apache.org/jira/browse/CAMEL-11846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16245573#comment-16245573
] 

Robert Half edited comment on CAMEL-11846 at 11/9/17 12:38 PM:
---------------------------------------------------------------

!my  example looks like this (and  it's really UTF-16BE).png!

it cannot be parsed by Camel out of the box, because it would tell XmlStreamReader, it needs
to use UTF-8 as encoding, so it will fail at first bytes, which are BOM


was (Author: antidote2):
!my  example looks like this (and  it's really UTF-16BE).png!

> xtokenize and apply xslt to a string does not work  with UTF-16BE
> -----------------------------------------------------------------
>
>                 Key: CAMEL-11846
>                 URL: https://issues.apache.org/jira/browse/CAMEL-11846
>             Project: Camel
>          Issue Type: Bug
>          Components: camel-core
>    Affects Versions: 2.17.5
>            Reporter: Robert Half
>         Attachments: UTF-16BE (with BOM).png, my  example looks like this (and  it's
really UTF-16BE).png
>
>
> In XML, encoding is often provided inside <?xml ..?> tag. In general, you cannot
read the tag, if you don't know the encoding, but XML Parsers support the detection of several
encodings which allows them to read the tag. With that information they can read the whole
file without knowing the "charset" in first place.
> xtokenize and xslt use XmlInputFactory#createXmlStreamReader(Reader). But by providing
a reader Camel tells, that it knows the encoding, so it won't be detected by the XML parser.
> Also Camel sets the charset to UTF-8 if it is not provided inside a header. This makes
the underlying reader fail reading UTF-16.
> Using XmlInputFactory#createXmlStreamReader(InputStream) inside XMLTokenExpressionIterator
works (tried in a patch). But the next xslt steps fails again because it again uses a Reader.
> See Stackoverflow Question for reference:
> [https://stackoverflow.com/questions/46322376/apache-camel-to-handle-encoding-declared-in-xml-file]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message