xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David_N_Bert...@lotus.com
Subject Re: accented characters and xerces j
Date Tue, 06 Nov 2001 02:35:17 GMT

This is not the best list for Xerces questions.  There is a Xerces-J list
that you should subscribe to.

The problem is that your document is encoded incorrectly.  There is no
ASCII character 246, since ASCII only defines characters up to 127.
However, there _is_ a character defined in ISO-8859-1with such a value.
Your document does not contain an XML declaration, so you need to add one
and specify the correct encoding:

   <?xml version="1.0" encoding="ISO-8859-1"?>

Dave



                                                                                         
     
                    Joseph                                                               
     
                    Shraibman            To:     general@xml.apache.org                  
     
                    <jks@selectac        cc:     (bcc: David N Bertoni/CAM/Lotus)     
        
                    ast.net>             Subject:     accented characters and xerces j
        
                                                                                         
     
                    11/05/2001                                                           
     
                    08:16 PM                                                             
     
                    Please                                                               
     
                    respond to                                                           
     
                    general                                                              
     
                                                                                         
     
                                                                                         
     



I'm using Xerces 1.3.1

I have a file that contains 'รถ', ascii 246


When I try to parse the file using xerces I get:
: 151, 6: An invalid XML character (Unicode: 0x1b6803) was found in the
element content of
the document.

Presumably when java reads the file before it gets to xerces it converts
246 to that
unicode value, but why?  I'm using the default (US) locale.

You can get the files involved from:
http://www.selectacast.net/~jks/xml/pr2.xml
http://www.selectacast.net/~jks/xml/pr2.txt is the original text file.


--
Joseph Shraibman
jks@selectacast.net
Increase signal to noise ratio.  http://www.targabot.com


---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org






---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Mime
View raw message