cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Victor Smirnov" <>
Subject Re: Remarks on i18n
Date Thu, 03 Feb 2000 10:27:13 GMT

My first message was just to point out the problem.
Here I try to put the sugestion for improvement.

Let's look at the tag
<?cocoon-format type="xxx/yyy"?>
We can include also the charset. This will be

<?cocoon-format type="xxx/yyy" charset="zzz"?>

The default charset can be set in config file (

formater.charset = Cp1251

Then, for instance, method getMIMEType of the class HTMLFormatter can
return "text/html; charset=Cp1251"

To test this idea I simply hardcorded my charset. Under Win98 this solves
the first :-) problem.
I've got Russian text instead of '???'.

As far as I know Xalan works well with ISO encodings.
(But when I try to set encoding in the document
other then file.encoding property it fails with null pointer exception.)
So I put ISO-8859-5 everywhere (in xml and xsl) and at list it doesn't
fail with exceptions as with Cp1251.

The other trouble - cocoon works in different way under Win98 and Linux,
even though I have the same libraries (xerces-1.0.1, xalan-0.19.1) and
Instead of russian text i get
all under the same conditions.
Now I try to find out who does this conversion.

At this moment I want to find what will be enough to solve the problem. I'm
the proper implementation will requare to change the API. The first

1. How can we parse desired output charset to the formater?

>> IMO this is dirty hack, but do you have better ideas?
>I haven't ever done any i18n stuff, so my input may be somewhat misguided,
>but this seems like an important issue and shouldn't be left hanging. Is
>it possible to query the XSLT processor to determine what the document's
>desired encoding is? If that's not possible yet, would it be appropriate
>to add yet another PI to cocoon:
><?cocoon-output-encoding type="whatever"?>

I'm affraid this wouldn't be enough. Thus we set, how string is converted
into byte
array that is send to client. But meanwhile formater can change "non
symbols into &xxx;. If it does, and this happens somehow, this will not
solve the problem.

- Victor

View raw message