cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Victor Smirnov" <vic...@uwc.ru>
Subject Re: Remarks on i18n
Date Thu, 03 Feb 2000 10:27:13 GMT
Hello!

My first message was just to point out the problem.
Here I try to put the sugestion for improvement.

Let's look at the tag
<?cocoon-format type="xxx/yyy"?>
We can include also the charset. This will be

<?cocoon-format type="xxx/yyy" charset="zzz"?>

The default charset can be set in config file (cocoon.properties):

formater.charset = Cp1251

Then, for instance, method getMIMEType of the class HTMLFormatter can
return "text/html; charset=Cp1251"

To test this idea I simply hardcorded my charset. Under Win98 this solves
the first :-) problem.
I've got Russian text instead of '???'.

As far as I know Xalan works well with ISO encodings.
(But when I try to set encoding in the document
other then file.encoding property it fails with null pointer exception.)
So I put ISO-8859-5 everywhere (in xml and xsl) and at list it doesn't
fail with exceptions as with Cp1251.

The other trouble - cocoon works in different way under Win98 and Linux,
even though I have the same libraries (xerces-1.0.1, xalan-0.19.1) and
jdk1.2.2.
Instead of russian text i get
'&iquest;&agrave;&Oslash;&Ograve;&Otilde;&acirc;'
all under the same conditions.
Now I try to find out who does this conversion.

At this moment I want to find what will be enough to solve the problem. I'm
affraid
the proper implementation will requare to change the API. The first
question:

1. How can we parse desired output charset to the formater?

>>
>> IMO this is dirty hack, but do you have better ideas?
>
>I haven't ever done any i18n stuff, so my input may be somewhat misguided,
>but this seems like an important issue and shouldn't be left hanging. Is
>it possible to query the XSLT processor to determine what the document's
>desired encoding is? If that's not possible yet, would it be appropriate
>to add yet another PI to cocoon:
>
><?cocoon-output-encoding type="whatever"?>

I'm affraid this wouldn't be enough. Thus we set, how string is converted
into byte
array that is send to client. But meanwhile formater can change "non
us-assci"
symbols into &xxx;. If it does, and this happens somehow, this will not
solve the problem.


- Victor



Mime
View raw message