xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David_N_Bert...@lotus.com
Subject Re: accented characters and xerces j
Date Tue, 06 Nov 2001 05:44:52 GMT

No, all the parser sees is a stream of bytes.  It's up to the parser to
interpret the bytes properly.  With no xml declaration, no encoding
provided, or no byte order mark, the parser assumes UTF-8.  In that case,
your document is not XML, because it contains invalid characters.

Dave



                                                                                         
     
                    Joseph                                                               
     
                    Shraibman            To:     general@xml.apache.org                  
     
                    <jks@selectac        cc:     (bcc: David N Bertoni/CAM/Lotus)     
        
                    ast.net>             Subject:     Re: accented characters and xerces
j     
                                                                                         
     
                    11/05/2001                                                           
     
                    10:08 PM                                                             
     
                    Please                                                               
     
                    respond to                                                           
     
                    general                                                              
     
                                                                                         
     
                                                                                         
     



How can that be?  Isn't unicode conversion done before any of the contents
are looked at?

David_N_Bertoni@lotus.com wrote:

> This is not the best list for Xerces questions.  There is a Xerces-J list
> that you should subscribe to.
>
> The problem is that your document is encoded incorrectly.  There is no
> ASCII character 246, since ASCII only defines characters up to 127.
> However, there _is_ a character defined in ISO-8859-1with such a value.
> Your document does not contain an XML declaration, so you need to add one
> and specify the correct encoding:
>
>    <?xml version="1.0" encoding="ISO-8859-1"?>
>
> Dave
>



--
Joseph Shraibman
jks@selectacast.net
Increase signal to noise ratio.  http://www.targabot.com


---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org






---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Mime
View raw message