axis-c-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "McCullough, Ryan" <>
Subject RE: Axis with UTF-8
Date Wed, 04 Feb 2009 01:54:18 GMT
The xerces parser does not parse the 3 bytes of utf8 characters. It is supposed to return the
3 characters but returns 1 byte of uninitialized memory (0xcd).

From: McCullough, Ryan []
Sent: Tuesday, February 03, 2009 6:48 PM
To: Apache AXIS C User List
Cc: Antonczyk, Ryszard
Subject: Axis with UTF-8

We are using Axis1 checked out from subversion along with Xerces-C Version 2.2.0.

We are having trouble using Axis to retrieve UTF-8 characters. Is there any additional setup

Here is where we think things are going arye.

axis\xml\xerces\XMLParserXerces.cpp::parse(bool ignoreWhitespace, bool peekIt)
About line 125 there is this:
// parse next token
m_bCanParseMore = m_pParser->parseNext(m_ScanToken);

It looks like the parseNext() function is converting 3 bytes of Unicode characters to 1 byte.

Here is the hex data being returned from our web service:
00000808h: EF A4 85                                        ; ï¤...

I have also attached the xml that was returned from the web service (xmlout14670.txt, this
is logged on the server).

Ryan McCullough | RightNow Technologies | Integration Tools Engineer
406-556-3162 office | Bozeman, MT |<>

View raw message