axis-java-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryo Neyama" <ney...@trl.ibm.co.jp>
Subject Re: Double byte characters in SOAP msg question
Date Thu, 17 Jan 2002 02:22:56 GMT
David,

> If I want to send a SOAP message that contains a text node containing that
> consists of double byte characters such as japanese characters.
>
> Do I have to do anything special in Axis to create and send this message?
> Do I have to do anything to make sure UTF-8 encoding is handled or does
the
> parser take care of it? It sounds like the default encoding in XML parsers
> is UTF-16. Who does the conversion?
>
> Has anyone tried this to make sure it works?

It depends on how Axis calls an XML parser.

If an input XML document is provided as an InputStream, the XML parser
decides the encoding according to the "encoding" declaration in xml
declaration. For example, in case of <?xml version="1.0"
encoding="Shift_JIS">, the XML parser handles the input stream as Shift_JIS
encoding and Axis can handle the Shift_JIS characters within the XML
document as a Java string, i.e. UTF-16 string. If the encoding is UTF-16 and
there is a byte order mark, which indicates whether the input stream is big
endian or little endian, at the beginning of the input stream, the parser
will report a parsing error.  This is because the byte order mark is not
allowed as the first character in an XML document.

If an input XML document is provided as a InputStreamReader with an
appropriate encoding, or some wrapper Reader of such InputStreamReader, the
XML parser treats the input stream as UTF-8, and therefore it ignores any
"encoding" declaration.  In this case, Axis can handle the strings in the
XML document correctly.

Typically, the encoding is specified by Content-Type header in SOAP-HTTP.
How to resolve encoding when the encoding is specified both or one of the
Content-Type header and the "encoding" declaration is prescribed by RFC
3023.  Although Axis also should follow the RFC, I haven't checked it.

Best regards,
    Ryo Neyama @ IBM Research, Tokyo Research Laboratory
    Internet Technology
    neyama@trl.ibm.co.jp



Mime
View raw message