Jan Luehe wrote:
> Bill,
>
>
>>> luehe 2004/07/27 17:43:17
>>>
>>> Modified: coyote/src/java/org/apache/coyote Response.java
>>> Log:
>>> Fixed Bugtraq 6152759 ("Default charset not included in Content-Type
>>> response header if no char encoding was specified").
>>>
>>> According to the Servlet 2.4 spec, calling:
>>>
>>> ServletResponse.setContentType("text/html");
>>>
>>> must yield these results:
>>>
>>> ServletResponse.getContentType() -> "text/html"
>>>
>>> Content-Type response header -> "text/html;charset=ISO-8859-1"
>>>
>>> Notice the absence of a charset in the result of getContentType(), but
>>> its presence (set to the default ISO-8859-1) in the Content-Type
>>> response header.
>>>
>>> Tomcat is currently not including the default charset in the
>>> Content-Type response header if no char encoding was specified.
>>>
>>
>>
>> -1. This gets us right back to the same old problem where we are sending
>> back "image/gif; charset=iso-8859-1", and nobody can read the response.
>
>
> yes, sorry, I had forgotten about that case.
>
>> If we're not going to assume that the UA believes that the default
>> encoding
>> is iso-8859-1 (which is what we are doing now),
>
>
> I think the reason the spec added the requirement to clearly identify
> the encoding in all cases (when using a writer) was because many
> browsers let the user choose
> which encoding to apply to responses that don't declare their encoding,
> which will result in data corruption if the response was encoded in
> ISO-8859-1 and the user picks an incompatible encoding.
AFAIK browsers let the user choose the encoding even if it is specified.
And they do that exactly because some 'smart' servers send a wrong
encoding ( like 8859-1 ) even if the content is different.
If you are using a foreign charset, your data will be either 8859-x (
with x!= 1 ) or UTF8. In any case - it will never be 8859-1 ( since the
foreign characters won't exist there ). So the requirement is to
basically break any foreign language.
>
>> then I'd suggest simply
>> doing:
>> setCharacterEncoding(getCharacterEncoding());
>> in Response.getWriter (since the spec only requires that we identify the
>> charset when using a Writer, and we don't really know what it is when
>> using
>> OutputStream).
>
>
> The problem with this is that if you call getWriter() (with your
> proposed fix) followed by getContentType(), the returned content type
> will include a charset, which is against the spec of getContentType():
>
> * If no character encoding has been specified, the
> * charset parameter is omitted.
>
> This is why we need to append the default charset to the value of the
> Content-Type header, if no char encoding has been specified.
On one side it is required to identify the charset in all cases ( to not
confuse browsers ), but on the other you are not allowed to specify the
real encoding from the writer, if it wasn't specified :-) ?
Costin
>
> Jan
>
---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org
|