cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivo Hulinsky <>
Subject UTF-8 and Cocoon 1.7.3-dev
Date Thu, 13 Apr 2000 08:31:05 GMT

   I've got problem with ISO-8859-2 encoding in Cocoon for while (but not 
only with Cocoon :-)).

Cocoon-1.7.2 had some encoding problem, but UTF-8 still works. Not
now with 1.7.3-dev.

I've got simple xml file with "czech" test text.

<?xml version="1.0" encoding="UTF-8"?>

    <title>PĹ~YĂ­liĹĄ ĹžluĹĽouÄ~MkĂ˝ kĹŻĹ~H ĂşpÄ~[l Ä~OĂĄbelskĂŠ
    <p>PĹ~YĂ­liĹĄ ĹžluĹĽouÄ~MkĂ˝ kĹŻĹ~H ĂşpÄ~[l Ä~OĂĄbelskĂŠ

Now cocoon-processing, no XSLT, no XSP. Output look's like:

<?xml version="1.0" encoding="UTF-8"?>

    <title>P?íli? ?lu?ou?ký k?? úp?l ?ábelské ódy</title>
    <p>P?íli? ?lu?ou?ký k?? úp?l ?ábelské ódy</p>

<!-- This page was served in 18 milliseconds by Cocoon 1.7.3-dev -->

Every 8-bit chars, not áéíóúý (aeiouy caron) show like "?". It's same with
type text/xml, text/plain, text/wml. With text/html i've got small
differents, but only aeiouy with caron chars are coded like entity. This
(text/html) can be "correctly?" changed, when I modify HTMLEntities.res
file. I've got HTMLEntities.res with all Latin Extended-A chars. But
client (Netscape,IE) don't understand this. Lynx does :-).
With IS-8859-2 encoded source xml i've got same output.
Whe i try change formatter.[type].encoding (text/xml, text/html) to
ISO-8859-2, same output with "?".

I think, that parser in Xerces works. DOMWriter sample from Xerces tree
works correctly. Servlet DefaultApplyXSL from Xalan samples tree works
favourable. Something goes wrong, but with XML ISO-8859-2 input and
XML UTF-8 output i've got untouched 8-bit ISO-8859-2 chars (bug, but it's
work for me).

My configuration is:



What's wrong?

						Ivo Hulinsky

View raw message