axis-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Veithen <andreas.veit...@skynet.be>
Subject Re: Invalid UTF-8 character encoding in SOAP response
Date Mon, 09 Jun 2008 20:58:51 GMT
Aman,

D869 DE1A is actually the surrogate pair for the character with code  
point 2A61A, which is encoded as F0AA989A in UTF-8 (see http://www.cogsci.ed.ac.uk/~richard/utf-8.cgi)

. The two other character references (&#xD858;&#xDF4C;) correspond to  
another character. I'm not an expert, but the XML specs don't mention  
surrogate pairs and I think that the correct way of encoding the  
character as a character reference should be &#x2A61A; in this case.  
This definitely looks like a bug in the XML parser. I would try to  
replace the XML parser by a new version of the same parser or by  
another parser. I'm not familiar with Axis 1, so I don't know what  
kind of parser (SAX or StAX) it uses. Maybe somebody else on the list  
can give a hint?

Andreas


On 9 juin 08, at 22:18, Amandeep Singh wrote:

> Hi All,
>
> I am using axis 1.3. If the response contains a CJK character in  
> UTF-8, axis converts it into an xml entity. On the receiver side,  
> xml parsing fails saying that it is an invalid xml entity.
>
> The character used has UTF-8 value F0AA989A. And axis converts it  
> into &#xD869;&#xDE1A;&#xD858;&#xDF4C;. And parser fails at first  
> entity.
>
> Any ideas/hints would be greatly appreciated?
>
> Thanks,
> Aman


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-user-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-user-help@ws.apache.org


Mime
View raw message