xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Scott" <jsc...@hnt.com>
Subject Re: Unicode problem
Date Wed, 02 May 2001 20:53:16 GMT
I've found that it's useful to String.trim() before sending to the XSLT engine/XML parser.
We had a problem awhile back with Oracle adding a 0x0 "bonus character" to the end of XML
snippets extracted from the database. Trimming the snippets before inserting them into the
document cured the problem.

JLS
  ----- Original Message ----- 
  From: Andy Heninger 
  To: general@xml.apache.org 
  Sent: Wednesday, May 02, 2001 11:58 AM
  Subject: Re: Unicode problem


  From the XML spec,  http://www.w3.org/TR/REC-xml#charsets
   
        [2]    Char    ::=    #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
/* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */ 

   
  0x1e is not in the list, so if one of your new data files happens to contain one, an invalid
XML character error would be expected result.
   

  Andy Heninger
  IBM, Cupertino, CA
  heninger@us.ibm.com

    ----- Original Message ----- 
    From: Jonathan Cates 
    To: general@xml.apache.org 
    Sent: Monday, April 30, 2001 5:37 PM
    Subject: Unicode problem


    I am working on a project that is using the German language.  All our xml is
    supposed to be headed with iso-8859-1.  Some data was recently loaded to the
    database, and I am suddenly getting the following exception:

    SystemId Unknown; Line 292; Column 24; ; Line#: 292; Column#: 24
    javax.xml.transform.TransformerException: An invalid XML character (Unicode:
    0x1e) was found in the element content of the document.
            at
    org.apache.xalan.transformer.TransformerImpl.transform(TransformerImpl.java:
    660)
            at
    org.apache.xalan.transformer.TransformerImpl.transform(TransformerImpl.java:
    1118)


    Where the code looks like:
    public void process(Source xml, Source xsl, Writer out){
            try{

                 TransformerFactory tFactory;
                 Transformer serializer;

                             tFactory = TransformerFactory.newInstance();

                serializer = tFactory.newTransformer(xsl);
                serializer.setOutputProperty(OutputKeys.ENCODING,"iso-8859-1");
                serializer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION,
    "yes");
                serializer.transform(xml ,new StreamResult(out));
            }catch(Exception ex){
                ex.printStackTrace();

    ....

    Is there something I have missed here.  If the doc doesn't have the
    encoding="iso-8859-1" should this matter if I explictly set it?  I am using
    v2 of xalan/xerces.  Any help is appreciated.

    Thanks
    Jon


Mime
View raw message