xerces-c-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Suchý <zu...@post.cz>
Subject RE: xerces/ICU unicode alias for weak encoding when serializing/converting to CP
Date Tue, 16 Dec 2008 10:36:50 GMT
Hello again,
i have tried to use class:

http://xerces.apache.org/xerces-c/apiDocs-2/classXMLFormatter.html#_details

with attributes: NoEscapes , UnRep_Replace 

and the problematic char was replaced by:
^Z

But it is still not solving problem with Oracle DB XML parser to parse this xml. I have got
this error:

ORA-31011: XML parsing failed
ORA-19202: Error occurred in XML processing
LPX-00216: invalid character 26 (0x1A)
Error at line 22

I would like to replace unknown character with my own character, which will be parseable (for
example char "?" or "_").
How can I change replacement character, which is used as default?

Thank anybody for any idea.

Have a nice day,
Jan


> ------------ Původní zpráva ------------
> Od: Jan Suchý <zuchy@post.cz>
> Předmět: RE: xerces/ICU unicode alias for weak encoding when
> serializing/converting to CP
> Datum: 16.12.2008 09:35:40
> ----------------------------------------
> Hello Jesse,
> thank you for your answer :-) it seems to be promising. I'll look at it.
> Jan
> 
> 
> > ------------ Původní zpráva ------------
> > Od: Jesse Pelton <jsp@PKC.com>
> > Předmět: RE: xerces/ICU unicode alias for weak encoding when
> > serializing/converting to CP
> > Datum: 15.12.2008 18:15:49
> > ----------------------------------------
> > The constructors for the Xerces XMLFormatter object all take an UnRepFlags
> > argument that allows you to specify how to handle unrepresentable characters.
> 
> > So does XMLFormatter::formatBuf().  It appears that the transcoder gets to
> > decide what character to replace unrepresentable characters with.
> > 
> > Hope that helps.
> > 
> > -----Original Message-----
> > From: Jan Suchý [mailto:zuchy@post.cz] 
> > Sent: Monday, December 15, 2008 4:25 AM
> > To: c-users@xerces.apache.org
> > Subject: xerces/ICU unicode alias for weak encoding when
> serializing/converting
> > to CP
> > 
> > Hello all,
> > I need to obtain output XML in iso-8859-2 encoding.
> > I am using UTF-8 as input encoding.
> > There is some character, in UTF-8 xml, which is not representable in
> > iso-8859-2.
> > I am using ICU 3.8, xerces 2.8 and Xqilla svn 702.
> > 
> > After serializing XML to iso-8859-2 the problematic character is serialized
> by
> > ICU/xerces/xq to:
> > 
> > &#x2013;
> > 
> > The problem is, that if I will send message in iso-8859-2 with character
> > &#x2013; inside to Oracle DB, the Oracle parser 
> > 
> > does not like this character and this error is obtained:
> > 
> > ORA-31011: XML parsing failed, LPX-00217: invalid character 8211 (U+2013)
> > 
> > So, what I am looking for is some method, how to say to the ICU or to Xerces
> or
> > to XQ, that the Unicode character, must 
> > 
> > not be included in result and must be for example replaced by character "?",
> to
> > avoid Oracle parser to process it.
> > 
> > I would like to find clear solution, like saying to ICU not calling callback
> > function or define own alias or behavior on 
> > 
> > this situation. Is it possible?
> > Any ideas?
> > Thank you
> > Jan Suchy
> > 
> > 
> > 
> 
> 
> 

Mime
View raw message