xerces-c-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Lilley <jlil...@datalever.com>
Subject RE: Xerces fails to parse XML due to some invalid UTF8 characters.
Date Wed, 09 Sep 2009 15:45:24 GMT
You could create a custom input stream and filter out the bad encodings.  UTF8 is pretty simple:
http://en.wikipedia.org/wiki/Utf8

In addition we find that we need to remove code points < 0x20 except for 0x9, 0xa, 0xd:
See the note on Compatibility Characters:
http://www.w3.org/TR/2006/REC-xml11-20060816/#charsets

Perhaps there is something in Xerces that will do this automatically?  I don't know of anything.

I think the real solution is to fix the XML emitter.

john

-----Original Message-----
From: Dan Ribe [mailto:dan.ribe@gmail.com] 
Sent: Wednesday, September 09, 2009 1:14 AM
To: c-users@xerces.apache.org
Subject: Xerces fails to parse XML due to some invalid UTF8 characters.

Hi All,
I am using Xerces library to parse the XML data sent by the server & facing
an issue. Xerces fails to parse the data if data from the server contains
some invalid UTF8 character.

Is there anyway to tell the Xerces lib to ignore any such invalid characters
& return whatever has been parse successfully or any other solution to this
issue ? I am using Xerces library 2.8.

Any pointers on this can make my day !

Cheers!
-Dan

Mime
View raw message