cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitry Melekhov>
Subject Re: Remarks on i18n
Date Thu, 03 Feb 2000 11:37:16 GMT
Victor Smirnov wrote:

> Hello!
> My first message was just to point out the problem.
> Here I try to put the sugestion for improvement.
> Let's look at the tag
> <?cocoon-format type="xxx/yyy"?>
> We can include also the charset. This will be
> <?cocoon-format type="xxx/yyy" charset="zzz"?>
> The default charset can be set in config file (
> formater.charset = Cp1251

Hmm. Do you mean that charset in document is charset for document
and charset for formatter is for output charset? Or output is always utf-8?
utf 8 output is not suitable for me.

> Then, for instance, method getMIMEType of the class HTMLFormatter can
> return "text/html; charset=Cp1251"
> To test this idea I simply hardcorded my charset. Under Win98 this solves
> the first :-) problem.
> I've got Russian text instead of '???'.
> As far as I know Xalan works well with ISO encodings.
> (But when I try to set encoding in the document
> other then file.encoding property it fails with null pointer exception.)
> So I put ISO-8859-5 everywhere (in xml and xsl) and at list it doesn't
> fail with exceptions as with Cp1251.
> The other trouble - cocoon works in different way under Win98 and Linux,
> even though I have the same libraries (xerces-1.0.1, xalan-0.19.1) and
> jdk1.2.2.
> Instead of russian text i get
> '&iquest;&agrave;&Oslash;&Ograve;&Otilde;&acirc;'
> all under the same conditions.
> Now I try to find out who does this conversion.

I agree. I have the same with blackdoen jdk 1.1.7 and ibm jdk 1.1.8.
I get it works with Xerces formatter dirty hack ;)

> At this moment I want to find what will be enough to solve the problem. I'm
> affraid
> the proper implementation will requare to change the API. The first
> question:
> 1. How can we parse desired output charset to the formater?
> >>
> >> IMO this is dirty hack, but do you have better ideas?
> >
> >I haven't ever done any i18n stuff, so my input may be somewhat misguided,
> >but this seems like an important issue and shouldn't be left hanging. Is
> >it possible to query the XSLT processor to determine what the document's
> >desired encoding is? If that's not possible yet, would it be appropriate
> >to add yet another PI to cocoon:
> >
> ><?cocoon-output-encoding type="whatever"?>
> I'm affraid this wouldn't be enough. Thus we set, how string is converted
> into byte
> array that is send to client. But meanwhile formater can change "non
> us-assci"
> symbols into &xxx;. If it does, and this happens somehow, this will not
> solve the problem.
> - Victor

Dmitry Melekhov

View raw message