xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andy Heninger" <an...@jtcsv.com>
Subject Re: Unicode problem
Date Wed, 02 May 2001 15:58:47 GMT
>From the XML spec,  http://www.w3.org/TR/REC-xml#charsets

      [2]    Char    ::=    #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
/* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */ 


0x1e is not in the list, so if one of your new data files happens to contain one, an invalid
XML character error would be expected result.


Andy Heninger
IBM, Cupertino, CA
heninger@us.ibm.com

  ----- Original Message ----- 
  From: Jonathan Cates 
  To: general@xml.apache.org 
  Sent: Monday, April 30, 2001 5:37 PM
  Subject: Unicode problem


  I am working on a project that is using the German language.  All our xml is
  supposed to be headed with iso-8859-1.  Some data was recently loaded to the
  database, and I am suddenly getting the following exception:

  SystemId Unknown; Line 292; Column 24; ; Line#: 292; Column#: 24
  javax.xml.transform.TransformerException: An invalid XML character (Unicode:
  0x1e) was found in the element content of the document.
          at
  org.apache.xalan.transformer.TransformerImpl.transform(TransformerImpl.java:
  660)
          at
  org.apache.xalan.transformer.TransformerImpl.transform(TransformerImpl.java:
  1118)


  Where the code looks like:
  public void process(Source xml, Source xsl, Writer out){
          try{

               TransformerFactory tFactory;
               Transformer serializer;

                           tFactory = TransformerFactory.newInstance();

              serializer = tFactory.newTransformer(xsl);
              serializer.setOutputProperty(OutputKeys.ENCODING,"iso-8859-1");
              serializer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION,
  "yes");
              serializer.transform(xml ,new StreamResult(out));
          }catch(Exception ex){
              ex.printStackTrace();

  ....

  Is there something I have missed here.  If the doc doesn't have the
  encoding="iso-8859-1" should this matter if I explictly set it?  I am using
  v2 of xalan/xerces.  Any help is appreciated.

  Thanks
  Jon


Mime
View raw message