axis-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stadelmann Josef" <josef.stadelm...@axa-winterthur.ch>
Subject AW: addChild with a node that has German characters
Date Wed, 18 May 2011 13:14:07 GMT
What do you mean with messed up? What you see by an editor or any other form of display and
is in your code base in your buffer ready to be displayed are different pairs of shoes. 

 

1.       You need to look how UTF-8 encodes the German Umlaut(s)!  For that Goto Wiki @ http://en.wikipedia.org/wiki/UTF-8
and read until you unde4rstand it.

2.       UTF-8 does not have a single byte for an "ä" or "ö" or "ü" but uses 2 bytes for
it. 

3.       How they get presented to you depends on many thing. OS, Application representing
characters, used lower layers of SW like drivers etc. Selected conversions. 

4.       So looking at them even absolute correct encoded could easy give you the impression
about corruption.

 

[from WIKI for you]

UTF-8 encodes each of the 1,112,064[7] <http://en.wikipedia.org/wiki/UTF-8#cite_note-6>
 code points <http://en.wikipedia.org/wiki/Code_point>  in the Unicode character set
using one to four 8-bit bytes <http://en.wikipedia.org/wiki/Byte>  (termed "octets <http://en.wikipedia.org/wiki/Octet_(computing)>
" in the Unicode Standard). Code points with lower numerical values (i. e., earlier code positions
in the Unicode character set, which tend to occur more frequently in practice) are encoded
using fewer bytes,[8] <http://en.wikipedia.org/wiki/UTF-8#cite_note-7>  making the encoding
scheme reasonably efficient. In particular, the first 128 characters of the Unicode character
set, which correspond one-to-one with ASCII <http://en.wikipedia.org/wiki/ASCII> , are
encoded using a single octet with the same binary value as the corresponding ASCII character,
making valid ASCII text valid UTF-8-encoded Unicode text as well.

 

Josef

 

 

Von: Iyengar, Kumar [mailto:kumar_iyengar@bmc.com] 
Gesendet: Mittwoch, 18. Mai 2011 08:01
An: axis-user@ws.apache.org
Betreff: addChild with a node that has German characters

 

Hi all,

 

I am copying one node to another node. The Source node contains a child (Text) with German
characters. After adding the child to the new node the German characters get messed up.

 

The source node contains a string with 'umlaut a' and in the destination this character is
messed up.

 

I checked the factory from which the destination node is created and it has the default char
set as "utf-8".

 

Does anyone know why this is happening?

 

Thanks,

 

--kumar


Mime
View raw message