cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joerg Heinicke <>
Subject Re: output-encoding in HTMLGenerator, please help!
Date Tue, 14 Jan 2003 20:37:37 GMT
Hi Yury,

so we agree? The bug is in HTMLGenerator, but the expected encoding 
isn't UTF-8 (reading from doesn't work for me 
(NullPointerException)), but ISO-8859-1 or maybe the default encoding of 
the JVM. Can you file a bug in bugzilla?



Yury Mikhienko wrote:
> Hi Joerg!
> Thanx for your reply.
> The pure Tidy works properly (output stream encoding is the same as the input stream
> The problem, from my point of view, is in transformer (or streamer [if xpath is null
value]) input stream encoding (HTMLGenerator),
> because Tidy DOM parser returns KOI8-R encoded document  (the same as Tidy input document
> but  HTMLGenegator needs, I guess, UTF-8 encoded document in input stream for it's transformer
or streamer.
> What do you think about my guessing?
>>Hello Yuri,
>>I only can confirm the bug in HTML generator. It seems it can not read 
>>the KOI8-R encoded file correctly. I tested it with your html snippet 
>>saved to a static file.
>>serializer.setOutputProperty(OutputKeys.ENCODING, "KOI8-R"); of course 
>>does not help, because that's only the output. Configuring the 
>>serializer in the sitemap to KOI8-R works correctly, if the input file 
>>is not encoded in KOI8-R (and I guess in some other more or less exotic 
>>encodings too).
>>If it were a bug in the serializer, the character reference like &#240; 
>>would be ok, because a character, that's not directly available in this 
>>encoding, must be expressed/referenced by such a reference.
>>I hope, I didn't say anything wrong ;-) Yuri, I think it's the best to 
>>post a bug in bugzilla at

Please check that your question  has not already been answered in the
FAQ before posting.     <>

To unsubscribe, e-mail:     <>
For additional commands, e-mail:   <>

View raw message