cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pier Fumagalli <>
Subject Re: [proposal] fixing the encoding problems
Date Sun, 16 Mar 2003 23:51:39 GMT
On 16/3/03 23:38, "Vadim Gritsenko" <> wrote:
>> true. but you can't have chinese text in US-ASCII, right?
> Even if you can not that anybody will be able to read it ;-)
> So yes, right.

Unicode specifes (somewhere) that any character non representable by the
current charset-encoding should be replaced with a "?" (\u003f) which exists
in all representations...

>>> But I am not convinced that it's sitemap's responsibility to worry
>>> about encoding (from SoC POV).
>> I restate:
>> 1) I want a way for serializers to indicate to the pipeline what is
>> the encoding they will be using, so that the pipeline can set the
>> right HTTP header for it.
> +-0, I'm not sure (yet) on this one...

I am almost sure that it should be made all-the-way around: the client can
request a specific encoding to the server: See RFC 2616 section 14.2 page
102: the Accept-Charset header.

I believe that the TextSerializer should return what the client asked in its
request through the "Accept-Charset" header, if this is present.

It it isn't, it should default to what has been specified in the pipeline
(if we use <map:serialize charset="xxxx"/>) or default to the "cocoon
global" configuration...

>> 2) also, i want a way to overwrite the sitemap-wide behavior of every
>> single serializers, locally, such as
>>  <map:serialize encoding="UTF-8"/>
>> when the global serializer configurations state they will be using
>> something else.
> But this one is Ok with me and, more over, in line with earlier decision:

I'd say to use this only if the client didn't request a particular

On another thought... The cache should store unicode characters "as is", not
bytes, as those might change for the same request URL depending on the
different headers in the request...


View raw message