xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Cobban" <jcob...@magma.ca>
Subject UTF-8 output invalid from org.apache.xml.serialize.XMLSerializer
Date Sat, 06 Dec 2003 16:13:46 GMT
When I create a Text element which contains characters which are not in the
basic 128 of the first code page org.apache.xml.serialize.XMLSerializer  is
not converting them to valid UTF-8 escapes.  What am I doing wrong?

I create Text nodes in the DOM by, for example:

Document doc;
JTextArea textPrompt;
Text newTextNode;
Element descElt;
...
 newTextNode = doc.createTextNode(textPrompt.getText());
 descElt.appendChild(newTextNode);

The code to serialize the DOM is:

   private void saveXml(Document document)
    {
 // rename the existing layout file
 new File(fileName).renameTo(new File(fileName + "~"));
 // write the document out
 OutputFormat format = new OutputFormat(document);
 format.setIndenting(true);
 format.setLineWidth(0);
 format.setPreserveSpace(true);
 try {
     XMLSerializer serializer;
     serializer = new XMLSerializer (
    new FileWriter(fileName),
    format);
     serializer.asDOMSerializer();
     serializer.serialize(document);
 }
 catch (IOException ioe)
 {
...
}
}

If I enter a character such as e' (e with acute accent) into the JTextArea
and I look at the XML file using a non-UTF-8-aware editor I see that the e'
has been inserted as a single byte, not as the 2 character UTF-8 escaped
value.  If I subsequently try to read the XML file using XERCES it blows up
because of the invalid escape sequence.

How do I get a valid serialization of this DOM into XML using UTF-8?

Jim Cobban   jcobban@magma.ca
34 Palomino Dr.
Kanata, ON, CANADA
K2M 1M1
+1-613-592-9438


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Mime
View raw message