tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Schultz <>
Subject Re: [OT] Basic Authentication Failed with multibyte username
Date Fri, 22 Jan 2010 21:16:20 GMT
Hash: SHA1


(Marking OT because, well... just because).

On 1/22/2010 2:59 PM, Warnier wrote:
> Christopher Schultz wrote:
>> That "authorization.getBytes()" is just asking for trouble, because it
>> uses the platform default encoding to convert characters to bytes. It
>> should be using US-ASCII, ISO-8859-1, or something like that.
> -1
> I don't think you have a problem there, because what you are decoding
> into bytes there IS bytes (it is base64-encoded).

Maybe all character sets have bytes 0-127 the same as US-ASCII, but I
don't know about some of those I never see myself: Shift-JS and all
those Asian encodings, etc. It would be better to be explicit.

>> It also calls the String constructor with a byte array without
>> specifying the encoding, therefore using the platform default.
> +1
> That is indeed where you have a problem.  There you SHOULD always decode
> it as US-ASCII (or maybe iso-8859-1, I'm not quite sure what the spec
> says exactly).

- From my reading, the spec is silent but one can draw the conclusion that
US-ASCII is basically all that is supported. I should all the capability
of configuring this encoding to override the (soon to be) default of
US-ASCII: if the user knows the client will use UTF-8, they should be
allowed to force that encoding to be used.

> Let's say that the spec is clear and says that the header value is
> *TEXT, and that *TEXT is always US-ASCII (or ISO-8859-1) by default.
> Let's take it from the browser side first.
> If the "userid:password" is indeed composed only of us-ascii characters,
> then the browser base64-encodes this directly and it is trivial.(*)
> But let's say that "userid:password" is something else than us-ascii.
> Another part of the spec says that then, you have to encode it according
> to RFC2047.

No, I don't think this is correct: the spec says that the HTTP header
values must be in US-ASCII, and may be encoded using RFC2047 in order to
achieve that. Since Base64 encoding always results in a
US-ASCII-compatible value, there is no reason to involve RFC2047.

> My contention is then that the browser should first RFC2047-encode
> "userid:password", and then base64-encode the result.

While that sounds like a good idea, it's almost certainly never done
that way.

> Back on the server side.
> The server base64-decodes the authorization token, into an ascii string.
> It can do that always, because either the string was ascii to start
> with, or else it was not, but then it has been RFC2047-encoded, yelding
> a result that is ascii.
> (like : =?iso-8859-2?B?....base64-encoded stuff...?= )

This would be a decent configurable setting for a BASIC authenticator...
something like "allow-rfc2047" or whatever. What about those people who
really want to have a username like "=?whatever" and a password like
"whatever?="? They can't login? :)

> The above, I believe, would be totally consistent with the current RFCs.

Yes, but for whatever reason, nobody ever fully implements the RFCs :)
There are standards and there are practices. In this case, I think
practices outweigh the standards :)

> But there is a major catch : I don't believe that there is a browser on
> the market today, which "properly" encodes the "userid:password" string
> via rfc2047 when it isn't ascii.

Nor would it be appropriate to do so, because base64 encoding is
/always/ used and will therefore /always/ result in a valid HTTP
Authenticate header value.

- -chris
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla -


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message