tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Costin Manolache <>
Subject Re: cvs commit: jakarta-tomcat-connectors/coyote/src/java/org/apache/coyote
Date Thu, 29 Jul 2004 07:27:15 GMT
Jan Luehe wrote:

> Bill,
>>> luehe       2004/07/27 17:43:17
>>>  Modified:    coyote/src/java/org/apache/coyote
>>>  Log:
>>>  Fixed Bugtraq 6152759 ("Default charset not included in Content-Type
>>>  response header if no char encoding was specified").
>>>  According to the Servlet 2.4 spec, calling:
>>>    ServletResponse.setContentType("text/html");
>>>  must yield these results:
>>>    ServletResponse.getContentType() -> "text/html"
>>>    Content-Type response header -> "text/html;charset=ISO-8859-1"
>>>  Notice the absence of a charset in the result of getContentType(), but
>>>  its presence (set to the default ISO-8859-1) in the Content-Type
>>>  response header.
>>>  Tomcat is currently not including the default charset in the
>>>  Content-Type response header if no char encoding was specified.
>> -1.  This gets us right back to the same old problem where we are sending
>> back "image/gif; charset=iso-8859-1", and nobody can read the response.
> yes, sorry, I had forgotten about that case.
>> If we're not going to assume that the UA believes that the default 
>> encoding
>> is iso-8859-1 (which is what we are doing now),
> I think the reason the spec added the requirement to clearly identify
> the encoding in all cases (when using a writer) was because many
> browsers let the user choose
> which encoding to apply to responses that don't declare their encoding,
> which will result in data corruption if the response was encoded in
> ISO-8859-1 and the user picks an incompatible encoding.

AFAIK browsers let the user choose the encoding even if it is specified.

And they do that exactly because some 'smart' servers send a wrong 
encoding ( like 8859-1 ) even if the content is different.

If you are using a foreign charset, your data will be either 8859-x ( 
with x!= 1 ) or UTF8. In any case - it will never be 8859-1 ( since the 
foreign characters won't exist there ). So the requirement is to 
basically break any foreign language.

>> then I'd suggest simply
>> doing:
>>    setCharacterEncoding(getCharacterEncoding());
>> in Response.getWriter (since the spec only requires that we identify the
>> charset when using a Writer, and we don't really know what it is when 
>> using
>> OutputStream).
> The problem with this is that if you call getWriter() (with your 
> proposed fix) followed by getContentType(), the returned content type
> will include a charset, which is against the spec of getContentType():
>   * If no character encoding has been specified, the
>   * charset parameter is omitted.
> This is why we need to append the default charset to the value of the
> Content-Type header, if no char encoding has been specified.

On one side it is required to identify the charset in all cases ( to not 
confuse browsers ), but on the other you are not allowed to specify the 
real encoding from the writer, if it wasn't specified :-) ?


> Jan

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message