axis-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andreas Veithen" <andreas.veit...@gmail.com>
Subject Re: Encoding of non latin characters (Cyrillic)
Date Thu, 13 Nov 2008 19:46:02 GMT
José,

Neither (a) nor (b) are UTF-8. These are sequences of XML character
entities referring to Unicode code points. They are strictly the same,
except that the first one uses hexadecimal values, while the second
one uses decimal values.

Andreas

On Thu, Nov 13, 2008 at 18:53, José Ferreiro <jose.ferreiro@gmail.com> wrote:
> Hello all,
>
> I have the Russian Word (taken as example): Основное
>
> that is encoded as UTF-8 by axis as:
>
> &#x41E;&#x441;&#x43D;&#x43E;&#x432;&#x43D;&#x43E;&#x435;
(a)
>
> I may transmit this kind of information in a XML well formed packet using
> axis 1.4 after a client request from the server to the client again. There
> is no problem. The deserialization works perfectly.
>
>
> However if I try to transmit applying wss4j with encryption signature and
> timestamp the following error arises:
>
> org.apache.xml.security.encryption.XMLEncryptionException: An invalid XML
> character (Unicode: 0x1e)
> was found in the element content of the document.
>
> Therefore in order to avoid invalid characters in the packet I decide then
> to escape all XML chars
> using org.apache.commons.lang.StringEscapeUtils.escapeXML [1]
>
>
>
> In the client in order to recover the original world I decide to do an
> unescapeXML [1], which gives this Unicode string:
>
> &#1054;&#1089;&#1085;&#1086;&#1074;&#1085;&#1086;&#1077;
(b)
>
> First, it should be concluded that I am not getting the same Unicode string
> as at the beginning (a) where [(a) != (b)]
>
> I was then wondering what kind of encoding I got.
> I looked at this web site http://2cyr.com/decode/?lang=en to understand more
> and it looks like I got windows-1251 (see [2])
> that can be displayed in a browser as encoding="iso8859-1".
>
> My question is: Why didn't i get UTF-8 and how is it possible I got (b)
> ?????
>
>
> Thank you for your reading and any comments you might have.
>
> José Ferreiro
>
> Many thanks to Martin Gainty and Ognjen Blagojevic for already commeting and
> helping in another thread I posted.
>
>
> [1] -
> http://commons.apache.org/lang/api-release/org/apache/commons/lang/StringEscapeUtils.html
> [2] - http://en.wikipedia.org/wiki/CP1251
>
> PS: Thanks to Martin and
>
> --
> José Ferreiro
> MSc in Communication Systems, EPFL.
>
>
>
Mime
View raw message