tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier>
Subject Re: [slightly OT] FORM based authentication and utf-8 encoding of credentials
Date Wed, 26 Jun 2013 12:01:26 GMT
Jan Vávra wrote:
> Hello,
>>>> When I create user with password with czech String "ŽežUlička.1" the
>>>> browser sends correctly this string as:
>>>> POST http://localhost:70/myapp/j_security_check HTTP/1.1
>>>> Content-Type: application/x-www-form-urlencoded
>>>> j_username=p&j_password=%C5%BDe%C5%BEUli%C4%8Dka.1
>> The browser is not sending that correctly. The password is UTF-8 encoded
>> but the Content-Type fails to specify the character set used. It it did,
>> Tomcat would treat the password as UTF-8.
>> This is a common failing of browsers and is covered in the FAQ. [1]
>  Well I have tried IE, Firefox, Chrome. None of them is appending 
> charset in Content-Type.
>  I have manually modified the request header to:
> Content-Type: application/x-www-form-urlencoded; charset=utf-8
> and Tomcat gives me the letters in the correct form. Ok, good to know.
>>>> Any idea how to tell tomcat to use utf-8 in form based authentication?
>>>> It's tomcat 7.0.34 on Czech Windows 7 32 bit with default ansi code 
>>>> page
>>>> set as Windows-1250.
>> Authentication is tricky because the processing happens before any user
>> code runs. The best / only option is to set the characterEncoding
>> attribute for the Authenticator [2] to UTF-8 and hope that the browsers
>> are consistent in their failing to follow the specification and use
>> whatever encoding the page is encoded with.
>> HTH,
>> Mark
>> [1]
>> [2]

> As you have referred in [2] I have added to my app's context xml
> <Valve className="org.apache.catalina.authenticator.FormAuthenticator" 
> characterEncoding="utf-8"/>
> and Czech letters are in the correct form. This is a solution.
> Thanks for an advice.

By the way, referring to this basic failing of browsers : this is something that is 
clearly contrary to the specs, yet since years all major browsers have consistently 
ignored this issue.
This failure of adding the character set/encoding to HTTP POST's is causing problems in 
multi-lingual web applications, and by itself forcing multiple workarounds which 
themselves are per force inconsistent.
Does anyone have an idea why browsers keep on ignoring this issue, version after version ?

(I would imagine that Apache httpd and Tomcat devs must have regular contacts with 
whomever develop browsers, so did anyone ever ask ?)

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message