cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pier Fumagalli <>
Subject Re: [proposal] fixing the encoding problems
Date Mon, 17 Mar 2003 21:25:42 GMT
On 17/3/03 18:23, "Dirk-Willem van Gulik" <> wrote:

>> I am almost sure that it should be made all-the-way around: the client can
>> request a specific encoding to the server: See RFC 2616 section 14.2 page
>> 102: the Accept-Charset header.
> Or an _ordered_list_ of those as input. See also the Languages while you
> are at it;  and the Accept: type as well - they are all dimensions of the
> same problem. And they are not orthogonal; i.e. there is an easy semantic
> coupling between languages and charset - and the Accept list may prompt
> you to send a gif or pdf in some cases.

Yes... You're absolutely right... I was re-reading that part of HTTP on the
tube today, and it gets pretty nasty at that point...

Basically, correct me if I'm wrong, from what I understand the client sends
a list of "preferred" encodings, while the application should "negotiate"
charset, language and type...

It gets quite complicated, because for the same URL the client might request
a Japanese, shift_jis, text/html view, while another might request a simple

It basically implies that the URL is a resource _for_real_ and that the
client can decide the way in which he wants to receive it..

>> On another thought... The cache should store unicode characters "as is", not
>> bytes, as those might change for the same request URL depending on the
>> different headers in the request...
> You'd have to track which Accept, Accept-Language and Accept-Charset you
> negotiated on. As applications may (also) do i18n and localizations
> optimizations such as swapping ',' into '.' or abusing charsets and doing
> locale specific normalizations of the unicode cast.

Yes yes yes...

But there is a problem... Proxies and caches...

If, for example, in my corporation there are two guys, one using Windows in
jp and one using Linux in en_US, if the first guy requests
"", I'll deliver the page the first time in jp,
encoded in shift_jis (let's not track content-type for a sec).

Now, when the second guy requests the same page, I'd have to send it in
en_US maybe encoded in iso-8859-1...

But my corporation proxy (or the cocoon cache), will cache the first version
it hits, so, to both of them, I'll end up serving the same Japanese
shift_jis content...

Not good... Needs more thinking indeed...


View raw message