tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Thomas <ma...@apache.org>
Subject Re: DefaultServlet doesn't set charset
Date Fri, 08 Aug 2008 15:04:10 GMT
Markus Schönhaber wrote:
> Hi,
> 
> AFAICT Tomcat's DefaultServlet doesn't add "; charset=..." to the
> Content-Type header when serving static resources of content type text/*
> and the corresponding resource isn't encoded in ISO-8859-1.
Correct.

> As I understand it, this is a violation of the HTTP 1.1 spec, since RFC
> 2616 says in section 3.7.1:
> |  The "charset" parameter is used with some media types to define the
> |  character set (section 3.4) of the data. When no explicit charset
> |  parameter is provided by the sender, media subtypes of the "text"
> |  type are defined to have a default charset value of "ISO-8859-1" when
> |  received via HTTP. Data in character sets other than "ISO-8859-1" or
> |  its subsets MUST be labeled with an appropriate charset value. See
> |  section 3.4.1 for compatibility problems.
Yes, but... it is debatable in a container environment who is responsible 
for ensuring this requirement is met. If you have multiple text files each 
with a different character set Tomcat is going to have to start guessing 
the charset from the content - a path I wouldn't want to go down.

> I'm seeing this with Tomcat 6.0.18, JDK 6u6 on 64-bit Ubuntu Hardy with
> a platform default encoding of "UTF-8".
> To reproduce this, one can simply put a UTF-8-encoded plain text file
> containing non-ASCII characters in in webapps/ROOT of a default Tomcat
> 6.0.18 installation and access this file via browser. Instead of the
> non-ASCII characters the browser should display the well-known garbage
> one gets when UTF-8 is decoded using an 8-bit charset (provided, the
> browser doesn't do some guessing of the charset based on the content).
And most of them do, don't they?

> Doing a quick search on bugzilla I only came up with
> https://issues.apache.org/bugzilla/show_bug.cgi?id=41773
> Now I'm unsure whether I do something completely wrong or my
> interpretation of the spec and DefaultServlet's behaviour is correct -
> which would mean that this is a bug.
You could argue, based on the spec extract above, if the platform default 
encoding isn't ISO-8859-1 that Tomcat should add this to the Content-Type 
header although I am wary about what this might break. As Remy points out 
in that bug, if you need that functionality it is easy to extend the 
DefaultServlet or your could write a simple Filter.

That said I wouldn't be against a patch that introduced a 
useFileEncodingInCharset parameter (although a shorter name would be better ;)

> Can someone shed some light on this?
HTH,

Mark



---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message