cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Quinn <jer...@media.demon.co.uk>
Subject encoding problem in textarea
Date Fri, 03 Dec 2004 13:44:00 GMT
Hi All

I have an editor for XHTML snippets built in CForms using 2.1.7-dev.
It is very basic, it just uses a textarea.
I am having encoding issues, that appeared only in the last week or so,  
I cannot work out the solution.

Symptoms:
I have an accented character in my source document.
The document is displayed in the textarea, with that character  
corrupted.
If you save, that character is saved corrupted to disk.
I have the same accented character output by i18n outside of the  
textarea, it displays correctly.

Scenario:
My source document is UTF-8.
My serializer (o.a.c.serialization.HTMLSerializer) outputs UTF-8.
My web.xml's 'form-encoding' parameter is set to UTF-8.
My browser recognises the document as UTF-8.

Behaviour:
The character "é" (e acute) when outside of the textarea is serialised  
as &eacute;.
The same character is serialised as &radic;&copy; when it is within the  
textarea.
The brackets of the XHTML tags in the textarea are output as entities.

I output to log the string being edited. The accented character is  
correct before being added to the widget, correct after being added to  
the widget but before display.

If I edit the character to correct it in the form, the correct  
character is written to the file.
If I do not edit the character, the incorrect characters ("é" ie. the  
characters represented by &radic;&copy;) are written to the file.

However, regardless of whether I edit it or not, my log message shows  
the correct character after the form has been submitted, before it has  
been written back.

Technique:
I read the XML Source to a String (to add to the textarea widget) like  
this:

var string = org.apache.avalon.excalibur.io.IOUtil.toString(
   new java.io.BufferedInputStream(
     org.apache.cocoon.components.source.SourceUtil.getInputSource(
       resolver.resolveURI(uri)
     ).getByteStream()
   )
);
form.lookupWidget("xhtml").setValue(string);

I write the String from the widget back to XML File like this:

var source = resolver.resolveURI(uri);
var dom = parser.parseDocument(
   new org.xml.sax.InputSource(
     new java.io.StringReader(form.lookupWidget("xhtml").getValue())
   )
);

// basically copied from the samples
var outputStream = null;
try {
   var tf =  
Packages.javax.xml.transform.TransformerFactory.newInstance();
   if (source instanceof  
Packages.org.apache.excalibur.source.ModifiableSource
       &&
      
tf.getFeature(Packages.javax.xml.transform.sax.SAXTransformerFactory.FEA 
TURE))
   {
     outputStream = source.getOutputStream();
     var transformerHandler = tf.newTransformerHandler();
     var transformer = transformerHandler.getTransformer();
      
transformer.setOutputProperty(Packages.javax.xml.transform.OutputKeys.IN 
DENT, "true");
      
transformer.setOutputProperty(Packages.javax.xml.transform.OutputKeys.ME 
THOD, "xml");
      
transformer.setOutputProperty(Packages.javax.xml.transform.OutputKeys.EN 
CODING, "UTF-8");
     transformerHandler.setResult(new  
Packages.javax.xml.transform.stream.StreamResult(outputStream));
     var streamer = new  
Packages.org.apache.cocoon.xml.dom.DOMStreamer(transformerHandler);
     streamer.stream(document);
   } else {
     throw ("error.source.not-writeable");
   }	
} catch (e) {
   throw(e);
} finally {
   if (outputStream != null) {
     try {
       outputStream.flush();
       outputStream.close();
     } catch (error) {
       cocoon.log.error("Could not flush/close outputstream: " + error);
     }
   }	
}


Can anyone see what I am doing wrong?

I have tried using  
org.apache.cocoon.components.serializers.HTMLSerializer, but CForms  
does not work with it.

I have tried different doctypes.

I tried pre entity encoding the accented character in the source  
document, and the textarea showed the raw entity.

I really have made this work before, but now I am completely stumped !!!

Thanks for any suggestions.

regards Jeremy


--------------------------------------------------------

                   If email from this address is not signed
                                 IT IS NOT FROM ME

                         Always check the label, folks !!!!!
--------------------------------------------------------

Mime
View raw message