tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier>
Subject Character set issue
Date Sun, 04 Dec 2011 21:57:13 GMT

I need help with a problem on a Tomcat system.  The system is of difficult access, and I 
cannot access it directly right now (this is Sunday night in Europe).
I know that the system runs Tomcat 6.something, under Oracle/Sun Java 1.6, and that's all

I can say right now. The platform is RedHat RHEL, current version.

The problem which happens is that, after the update of a webapp (of which I do not have 
the code), it seems that non-US-English "diacritic" characters posted to the webapp from a

web <form>, are now "corrupted". And I would like to understand better the Tomcat 
mechanism for reading HTTP request form parameters, so that I can start to figure out what

is going wrong.

The webapp consists of a single servlet, wrapped by two filters.
The application's web.xml defines the order as
with both filters processing all requests to the servlet.

"filter1" is a commercial product used on many Tomcat sites.
"filter2" is my own filter (and it is the only part of which I have the source code)
"servlet" is also a commercial product of which I do not have the code, and the one which

has just been updated.

What I would like to know is : with a setup such as the above, how does Tomcat determine 
in which /character set/ the body of the POST will be read ?

For example :
Suppose that we have 2 html forms, form1 and form2.  Both forms are functionally 
identical, and contain a text input box named "name1".
The form form1 has an html declaration which specifies it as having the charset "iso-8859-1".
The form form2 has an html declaration which specifies it as having the charset "UTF-8".

The user, in the input box "name1" of each form, types the string "TÜV" (second character

= uppercase U with umlaut) and then posts the form to the webapp.
The user browser is the same in all cases.

If the servlet executes a request.getParameter("name1"), what are the factors which can 
determine how it receives the value of this parameter ?

Or maybe my question should be : /can/ the servlet (or one of the filters) do anything 
that would cause the value of "name1" to /not/ be a correct Java "TÜV" string in the servlet

Additional information :
Only the servlet was updated.  Prior to that update, the application worked correctly. So

I strongly suspect that it is the updated servlet which creates the problem.  But I'd like

to understand /how/ it can create such a problem, and if for example something in filter1

or filter2 could contribute to the problem, or not.
Filter1 is an authentication servlet filter, and as far as I know it only checks HTTP 
headers, and does not concern itself with the body of the request.  But I suppose that 
even the request body "passes through" this filter, and that it could presumably corrupt 
this body (although I would consider this unlikely right now).
Filter2 is my own filter (and I am not a Java expert).  This filter works at a number of 
installations (and also here, before this servlet update).  It subclasses the HTTP 
request, because it needs to add a HTTP header to the request, on-the-fly.  But the 
subclass only overrides the methods which have to do with the HTTP headers, and does not 
handle the body directly.

Any information or ideas welcome.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message