cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivo Hulinsky <h...@fido.cz>
Subject UTF-8 and Cocoon 1.7.3-dev
Date Thu, 13 Apr 2000 08:31:05 GMT
Hi,

   I've got problem with ISO-8859-2 encoding in Cocoon for while (but not 
only with Cocoon :-)).

Cocoon-1.7.2 had some encoding problem, but UTF-8 still works. Not
now with 1.7.3-dev.

I've got simple xml file with "czech" test text.

---
<?xml version="1.0" encoding="UTF-8"?>

<html>
  <head>
    <title>PĹ~YĂ­liĹĄ ĹžluĹĽouÄ~MkĂ˝ kĹŻĹ~H ĂşpÄ~[l Ä~OĂĄbelskĂŠ
Ăłdy</title>
  </head>
  <body>
    <p>PĹ~YĂ­liĹĄ ĹžluĹĽouÄ~MkĂ˝ kĹŻĹ~H ĂşpÄ~[l Ä~OĂĄbelskĂŠ
Ăłdy</p>
  </body>
</html>
---

Now cocoon-processing, no XSLT, no XSP. Output look's like:

---
<?xml version="1.0" encoding="UTF-8"?>

<html>
  <head>
    <title>P?íli? ?lu?ou?ký k?? úp?l ?ábelské ódy</title>
  </head>
  <body>
    <p>P?íli? ?lu?ou?ký k?? úp?l ?ábelské ódy</p>
  </body>
</html>

<!-- This page was served in 18 milliseconds by Cocoon 1.7.3-dev -->
---

Every 8-bit chars, not áéíóúý (aeiouy caron) show like "?". It's same with
type text/xml, text/plain, text/wml. With text/html i've got small
differents, but only aeiouy with caron chars are coded like entity. This
(text/html) can be "correctly?" changed, when I modify HTMLEntities.res
file. I've got HTMLEntities.res with all Latin Extended-A chars. But
client (Netscape,IE) don't understand this. Lynx does :-).
With IS-8859-2 encoded source xml i've got same output.
Whe i try change formatter.[type].encoding (text/xml, text/html) to
ISO-8859-2, same output with "?".

I think, that parser in Xerces works. DOMWriter sample from Xerces tree
works correctly. Servlet DefaultApplyXSL from Xalan samples tree works
favourable. Something goes wrong, but with XML ISO-8859-2 input and
XML UTF-8 output i've got untouched 8-bit ISO-8859-2 chars (bug, but it's
work for me).

My configuration is:

Cocoon-1.7.3-dev
Xerces-1.0.3
Xalan-1.0.0
Jserv-1.1
SUN-JDK1.1/IBM-JDK1.1.8/SUN-JDK1.2.2/BD-JDK1.2.2

Untouched cocoon.properties.

What's wrong?

						Ivo Hulinsky


Mime
View raw message