tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier>
Subject Re: Basic Authentication Failed with multibyte username
Date Thu, 21 Jan 2010 14:11:15 GMT
Mark Thomas wrote:
> On 21/01/2010 06:55, André Warnier wrote:
>> Mark Thomas wrote:
>>> The authorisation header is base64
>>> encoded so it is automatically compliant with RFC2616.
>> Yes, it sounds like you're right; my mistake.
>> (Also for Gabor, I admit my mistake.)
>> I agree that the HTTP header itself is correct.
>> But there is still somethig which puzzles me in the absolute.
>> Suppose that the browser and the server know nothing particular about
>> one another, and that the server gets such an Authentication header from
>> the browser.
>> The Base64 decoding is done, and yields a series of bytes.
>> Now this series of bytes have to be interpreted, to be translated into a
>> string in Java (which is Unicode).  Which encoding should be chosen to
>> decode the byte array ?
>> If you use the default platform JVM encoding, you are making the
>> assumption that the browser knew what this encoding is, aren't you ?
>> On the other hand, the browser sent nothing to indicate in which
>> encoding this string was, before it encoded it using Base64, or did it ?
> RFC2617 to the rescue...
>       basic-credentials = base64-user-pass
>       base64-user-pass  = <base64 [4] encoding of user-pass,
>                           except not limited to 76 char/line>
>       user-pass         = userid ":" password
>       userid            = *<TEXT excluding ":">
>       password          = *TEXT
> *TEXT is defined in RFC2616
>        TEXT           = <any OCTET except CTLs,
>                         but including LWS>
> and finally
>        OCTET          = <any 8-bit sequence of data>
>        CTL            = <any US-ASCII control character
>                         (octets 0 - 31) and DEL (127)>
> So actually, Tomcat is correct in the current treatment of credentials.
> Therefore, not a bug.
> Also André's comments regarding ISO-8859-1 were right if considering the
> actual user name and password rather than the header.
> Supporting other encodings would be a useful enhancement but the default
> will have to be ISO-8859-1 to remain spec compliant. What the browsers
> will do for user names and passwords in other encodings is not defined
> so it will be a case of YMMV.
> Mark
Let me be even more pernickety :

According to the HTTP 1.1 RFC 2616, HTTP header fields MAY contain *TEXT 
portions representing character sets other than US-ASCII.
But then, such header field values MUST be encoded according to the 
rules of RFC 2047.

RFC 2047 in turn, in "2. Syntax of encoded-words ", indicates that this 
should be done using the form :
encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
for example :

Header-name: =?iso-8859-1?B?some iso-8859-1 text, base-64 encoded?=
Header-name: =?utf-8?B?some unicode/utf-8 text, base-64 encoded?=
(I am not quite sure here of the "utf-8" part as the correct name for 
the charset.)

(NDLR: That is something one does find regularly in email headers; but I 
have never seen it used in HTTP headers until now.)

On the other hand, regarding authentication mechanisms, RFC 2616 refers 
to RFC 2617, which itself indicates the following format for an 
authorization header sent by the browser to the server :

Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==

When base64-decoded, the above string should look like "userid:password".

I did not find in RFC 2617 any specific mention of character set 
encoding, but it itself refers back to RFC 2616 as being the "base 
rules". And the base rules in RFC 2616 seem to be that header values are 
US-ASCII unless otherwise indicated.

In other words, my contention is as follows :

- if the "userid:password" above contain only US-ASCII characters, then 
the above simple form of the header is fine.
- if the "userid:password" string above contain characters other than 
US-ASCII however, then they should be further encoded, using the rules 
of RFC 2047.
This would mean that you should have something like :

Authorization: Basic =?utf-8?B?QWxhZGRpbjpvcGVuIHNlc2FtZQ==?=

(or, maybe, the other way around : it is the 
"QWxhZGRpbjpvcGVuIHNlc2FtZQ" string which, when base64-decoded, should 
yield a new string of the form 
"=?utf-8?B?QWxhZGRpbjpvcGVuIHNlc2FtZQ==?=", which should then be decoded 
once more to give the "userid:password" string).

Now, I am not sure that if you pass such a HTTP header, encoded as 
above, from Apache to Tomcat, that the Tomcat getHeader() call will 
properly decode it, using the indicated charset.

And I am not sure either that there exists any browser on the market 
that will encode a userid:password string that way.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message