tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Schultz <ch...@christopherschultz.net>
Subject Character encoding for POST x-www-form-urlencoding (a success story)
Date Fri, 12 Feb 2010 22:23:54 GMT
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

All,

My company recently decided to alter our password complexity
requirements for our webapp, and I got to implement the changes. What fun!

We use a regular expression to enforce our password complexity, and it
needed to be changed. Since we are starting to branch-out into
populations that aren't necessarily using written English everywhere, I
chose to change our naive [a-z]- and [A-Z]-type checking to a mroe
enlightened \p{Ll} and \p{Lu}, respectively. (Readers' note: jakarta-oro
does not support this notation, so you'll want to use Java's built-in
regular expression support to do this).

Anyhow, when making changes to things security-related, it pays to test
/everything/, so I grabbed 4 other people from my group and had them
each test 15 sample passwords against our 6 different forms that accept
password-change entry. Everything went fine.

Except when I then tried to login from our home page with the password
"1πππππππ" (that's a '1' digit followed by 7 Greek Pi characters, in
case your email reader can't render that), and I got a failure. I
figured I must have fat-fingered something, so I tried again and all was
well.

My spidey-sense tingling, I logged-out and repeated the process: again,
my first login attempt was unsuccessful, while the second was. Hmm. Upon
closer inspection, our opening page is a static HTML file served by
Apache httpd -- no Tomcat involvement. After a failed login, a page that
looks exactly like the home page is sent to the user, but it's
different: /and/ it's served by Tomcat.

The difference was that the original request's response (for
/index.html) had a Content-Type of "text/html", while the failed login
had a response Content-Type of "text/html; charset=UTF-8".

It's out old pal "what's the default encoding, again?" coming back to
haunt me, and here I am telling people on this list that they just don't
understand the history of the web and how to do things properly.
Evidently, I wasn't doing them properly, either.

All those complaints about the way that URL-encoded GET parameters can
get messed up based upon Content-Type and encoding guesses, etc. and the
solution is just to use POST is, well, only half the truth. Yes, POST
gets you away from the browser's preference for what encoding to use
before URL-encoding the bytes, but, with POST the Content-Type is
application/x-www-form-urlencoded, which means there's no charset
associated with it. :(

So, what's to be done?

Well, I immediately thought of two solutions:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
and
<form accept-charset="UTF-8">

Knowing that web browsers are notoriously inconsistent with one another
regarding certain things, I was sure that I'd have a giant mess when it
came to testing, and that I'd have to figure out how to trick each
version of each browser into doing my bidding.

First, I had to make sure that they all /failed/ in the same way (that
is to say, that the login failed the way I expected it to fail), then I
had to see what magical incantations would be necessary to actually get
the login to succeed.

I'm happy to report that, for /all/ of the following browsers, */both/*
solutions worked!

Mozilla Firefox 2.0
Mozilla Firefox 3.0
Mozilla Firefox 3.5
Mozilla Firefox 3.6
Opera 9.6
Opera 10.10
Apple Safari 3.2
Apple Safari 4.0
Google Chrome 4.0
MSIE 6.0
MSIE 7.0
MSIE 8.0

I'm inclined to use the <form accept-charset="UTF-8"> solution, because
that does not involve lying to the browser about the encoding of the
actual HTML document. Instead, I'd rather advertise that I will only
accept UTF-8 encoding and leave it at that. Sadly, the client still
doesn't tell me that the underlying encoding being used to urlencode the
POST parameters is UTF-8, but at least they're doing what I want them to
do, and they all agree on behavior!

So, score 1 for standards, at least in this instance.

- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkt11PoACgkQ9CaO5/Lv0PC+OACgtobt70NWFxYJzcRt5r0zXlaN
tYEAn0ZYnB/oehIoZR0NUs7Q/4mOux7x
=U0Wt
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message