cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <stef...@apache.org>
Subject Re: [proposal] fixing the encoding problems
Date Mon, 17 Mar 2003 20:15:48 GMT
Pier Fumagalli wrote:
> On 16/3/03 23:38, "Vadim Gritsenko" <vadim.gritsenko@verizon.net> wrote:
> 
>>>true. but you can't have chinese text in US-ASCII, right?
>>
>>Even if you can not that anybody will be able to read it ;-)
>>So yes, right.
> 
> 
> Unicode specifes (somewhere) that any character non representable by the
> current charset-encoding should be replaced with a "?" (\u003f) which exists
> in all representations...
> 
> 
>>>>But I am not convinced that it's sitemap's responsibility to worry
>>>>about encoding (from SoC POV).
>>>
>>>I restate:
>>>
>>>1) I want a way for serializers to indicate to the pipeline what is
>>>the encoding they will be using, so that the pipeline can set the
>>>right HTTP header for it.
>>
>>+-0, I'm not sure (yet) on this one...
> 
> 
> I am almost sure that it should be made all-the-way around: the client can
> request a specific encoding to the server: See RFC 2616 section 14.2 page
> 102: the Accept-Charset header.
> 
> I believe that the TextSerializer should return what the client asked in its
> request through the "Accept-Charset" header, if this is present.
> 
> It it isn't, it should default to what has been specified in the pipeline
> (if we use <map:serialize charset="xxxx"/>) or default to the "cocoon
> global" configuration...

Oh, that's right, I forgot about the client 'forcing' a charset. Great 
point.

>>>2) also, i want a way to overwrite the sitemap-wide behavior of every
>>>single serializers, locally, such as
>>>
>>> <map:serialize encoding="UTF-8"/>
>>>
>>>when the global serializer configurations state they will be using
>>>something else.
>>
>>But this one is Ok with me and, more over, in line with earlier decision:
>>http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=101826371615914&w=2
> 
> 
> I'd say to use this only if the client didn't request a particular
> encoding...
> 
> On another thought... The cache should store unicode characters "as is", not
> bytes, as those might change for the same request URL depending on the
> different headers in the request...

Uh, another good point.

Stefano.


Mime
View raw message