tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier ...@ice-sa.com>
Subject Re: <?xml version="1.0" encoding="ISO-8859 in web.xml
Date Wed, 25 Jun 2008 09:22:12 GMT

Christopher Schultz wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> 
> 
> André Warnier wrote:
> | What else does need to be done at the Tomcat configuration level so that
> | it would handle UTF-8 requests properly, and produce UTF-8 responses
> | properly ?
> 
> <sigh> I hate responding with the same old stuff, but these sources of
> information really do cover everything we are perseverating over:
> 
> http://wiki.apache.org/tomcat/FAQ/CharacterEncoding
> http://wiki.apache.org/tomcat/Tomcat/UTF-8
> Also:
> http://java.sun.com/developer/technicalArticles/Intl/HTTPCharset/
> 
The last reference (which I did not know) is excellent.  Thank you.

But the other two references, if you are perseverating over them, are in 
my view not good references worth perseverating over.

The article at
 > http://wiki.apache.org/tomcat/Tomcat/UTF-8
is incorrect.  The second part (Alternative) has been recently corrected 
for the better, but the very premise of the article is wrong and 
misleading.  It has been recently shown in a thread in this same forum 
that one does not normally need a filter, and I would submit that using 
a filter as indicated will corrupt data in some instances.

In the article at
 > http://wiki.apache.org/tomcat/FAQ/CharacterEncoding
there is also a problem in the form shown under the title

How can I test if my configuration will work correctly?

As demonstrated by a recent thread here also, the <form> tag as shown, 
is missing a
enctype="multipart/form-data"
attribute.
This will cause Tomcat to misinterpret the form data in some cases.
One could also argue that adding an attribute
accept-charset="UTF-8"
would make it even more failsafe.

In addition, the article also repeats a mistake often seen, which is to 
tell people that it's ok to send form data via a GET and use non 
US-ASCII data.  This is a receipe for problems, see the first mentioned 
article at java.sun.com.

That article explains the basic reason why it is a problem : although 
there exist (more or less) rules as to how to encode non-ASCII data in 
URLs, the problem is that when it receives such a request, the server 
has basically no idea how the URL was actually encoded, so it can only 
guess at how to decode it properly.

This is also explicitly discouraged in the HTML 4.01 RFC at
(http://www.w3.org/TR/html401/interact/forms.html#submit-format
17.13.4 Form content types )

Now, I know that these are Wiki articles and can be corrected by anyone, 
but isn't that a problem ? For better or worse, these articles are used 
as reference by Tomcat users.  See your own response above.
If someone goes ahead and posts incorrect technical stuff there, there 
is a problem, no ?
I mean that I, as a mere user, don't feel at ease going ahead and 
modifying the Wiki article of someone else unilaterally, nor of posting 
another one saying the previous one is all wrong.  But maybe there 
should be some form of authoritative control of the accuracy of what is 
posted there ?

André


---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message