tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier>
Subject Re: mod_jk codepage in header values
Date Thu, 21 Jan 2010 14:21:51 GMT
Mirko Solic wrote:
> On Thu, 2010-01-21 at 11:30 +0100, André Warnier wrote:
just for info : there is a related other thread taking place at the same 
time, entitled "Basic Authentication Failed with multibyte username".

Basically, I am interested in those topics because I encounter them 
myself often in our own web applications.
I don't know all the answers, but I know that it is confusing.

As far as I can interpret :

According to the HTTP 1.1 RFC 2616, HTTP header fields MAY contain *TEXT 
portions representing character sets other than US-ASCII.
But then, such header field values MUST be encoded according to the 
rules of RFC 2047.
RFC 2047 in turn, in "2. Syntax of encoded-words ", indicates that this 
should be done using the form :
encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
for example :

Header-name: =?iso-8859-1?B?some iso-8859-1 text, base-64 encoded?=
Header-name: =?utf-8?B?some unicode/utf-8 text, base-64 encoded?=
(I am not quite sure here of the "utf-8" part as the correct name for 
the charset.)

Now, I am not sure that if you pass a HTTP header, encoded as above, 
from Apache to Tomcat, the Tomcat getHeader() call will properly decode 
it, using the indicated charset.

If not, you will have to do the decoding yourself, if you want to pass 
non-ascii (or non-iso-8859-1) characters in those headers.
Admittedly, it is a pain; but there are still quite a few grey areas 
like that in the WWW-related RFCs in what concerns character sets.
If you have to do this kind of encoding/decoding, I suggest to have a 
look in MIME (email) libraries.  Such kind of encoding/decoding is 
regularly used in email headers.  Save the original text (.eml) format 
of an email, with a non-ascii subject line, for an example.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message