tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier>
Subject Re: Basic Authentication Failed with multibyte username
Date Thu, 21 Jan 2010 11:55:18 GMT
Mark Thomas wrote:
> On 21/01/2010 06:12, André Warnier wrote:
>> Auth Gábor wrote:
>>> Hi,
>>> I've found a potential bug in the Basic Authentication module. I have
>>> users and some user's username is contains national characters
>>> (encoded in UTF-8). The HTTP header based authentication is fails when
>>> the username or the password contains multibyte characters.
>>> The root of the bug is the Base64 decoder, which decodes the Base64
>>> stream to char array: converts each byte to individual char, this
>>> decode method corrupts the multibyte characters...
>> Hi.
>> Before declaring that this is a bug, I suggest that you read the other
>> thread entitled "mod_jk codepage in header values".
>> The main point is : according to the HTTP RFCs, a HTTP header value is
>> supposed to contain /only/ US-ASCII characters. Some byte values in
>> UTF-8 encoding are /not/ valid US-ASCII characters, so strictly speaking
>> and according to the RFC, HTTP headers which would contain them are
>> invalid.
>> It's a pain, but it's (probably) not a bug.
> In this case I think it is a bug. The authorisation header is base64
> encoded so it is automatically compliant with RFC2616.
Yes, it sounds like you're right; my mistake.
(Also for Gabor, I admit my mistake.)

I agree that the HTTP header itself is correct.
But there is still somethig which puzzles me in the absolute.
Suppose that the browser and the server know nothing particular about 
one another, and that the server gets such an Authentication header from 
the browser.
The Base64 decoding is done, and yields a series of bytes.
Now this series of bytes have to be interpreted, to be translated into a 
string in Java (which is Unicode).  Which encoding should be chosen to 
decode the byte array ?
If you use the default platform JVM encoding, you are making the 
assumption that the browser knew what this encoding is, aren't you ?
On the other hand, the browser sent nothing to indicate in which 
encoding this string was, before it encoded it using Base64, or did it ?

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message