tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier>
Subject Re: mod_jk codepage in header values
Date Thu, 21 Jan 2010 10:30:05 GMT
Mirko Solic wrote:
> Christopher thanks for quick replay.

> I'm from Slovenija, Europe. We are using character that are not defined
> in ASCII so we are using UTF-8 cp. 
> I will try to explain what is this application about.
> This project (web page) is protected with AAI
> ( This  Authentication and
> Authorization infrastructure is roughly divided on SP (service provider)
> and Idp (identity provider). SP is module in apache. So when user try to
> get web page that is protected with AAI through apache, SP module checks
> if user is alredy logged in. If not SP redirects user to Idp where user
> can put his/her username and password. If everything is ok Idp sends
> users data in xml to SP. SP puts this data into apache 
> environment variables so applications (web pages) can access it.
> Here i use mod_jk to get this environment variables in tomcat in HTTP
> header. If i print user data on apache side i get values in UTF-8
> encoding but if i try this on tomcat i don't get right values i have to
> make conversion.
> Is it mod_jk responsible for converting UTF-8 environment variable to
> ACSII header values or is this conversion made elsewhere? 
I am from Belgium, Europe too. I live in Spain and work mostly for 
German and other international customers (among which are some from 
Poland too). This to say that I am well-aware of multi-lingual character 
set issues, and confront them every day.
So, just so as to give you some "context" for your issues :

Despite the fact that Unicode and UTF-8 are now being increasingly used 
on the web, the fact is that HTTP, and HTML, and most of the other 
WWW-relevant RFCs, are still US-ASCII and ISO-8859-1 (latin-1) based.

For example, HTTP header values are /supposed/ to contain only 
single-byte character codes that are part of the (printable subset of) 
US-ASCII character set.
For example also, by default, all "content" exchanged between browsers 
and web servers is iso-8859-1.
And it is so because the relevant RFCs say that it should be.
(So the developers of Apache and mod_jk and Tomcat have little choice in 
the matter; they have to follow the RFCs).

This does not mean that you cannot handle other character sets on the 
web.  But it means that whenever you do, you have to be attentive to the 
fact that it is /not/ the standard, and that you may have to do 
character set translations and/or encoding.
It may even mean that, in order to exchange non-US-ASCII or 
non-ISO-8859-1 data, you may have to use "tricks".
It also means that, in some cases, by using such "tricks", your 
applications may become "non-standard", and will not necessarily work 
with all servers and all clients.

So for example, to get back to your question above : mod_jk is not 
responsible for translating anything, and will not translate anything. 
That is because mod_jk follows the relevant WWW RFCs, which specify that 
such and such data is ASCII or ISO-8859-1.

If the original HTTP request, as it is given by Apache to mod_jk, 
contains HTTP headers, mod_jk will forward these headers "as is" to the 
back-end Tomcat.  But, because the HTTP RFC specifies that HTTP headers 
should contain only US-ASCII character data, mod_jk would be allowed, if 
it finds non-US-ASCII data in a HTTP header, to strip this data or 
ignore the header or something like that.  I don't know if mod_jk 
actually does this, but if it did, it would be justified, because 
according to the HTTP RFC this would be an invalid header.

So, to be practical :
- the current HTTP 1.1 RFC specifies that HTTP headers can only contain 
US-ASCII printable character data
- some UTF-8 codes contain bytes that are not part of the US-ASCII 
character set (e.g. : bytes with values above 0x7F)
- so, if you want to forward such a header from Apache to Tomcat, in 
principle the "right" way is to "encode" the value of this header on the 
Apache side, in such a way that it contains only US-ASCII data (for 
example, using Base64 encoding), then pass it to mod_jk.
- at the other end, your application would have to decode this header 
(using Base64 decoding) back into UTF-8, and then it would have to read 
this header value as UTF-8/Unicode.

There is no guarantee that any standard module or class under Apache or 
mod_jk or Tomcat would properly handle a header that contains 
non-US-ASCII data.  That because, in principle, they never have to.

I know it is a mess. It is possible that there are shortcuts.  It is 
possible that mod_jk would transmit a HTTP header, even if it contains 
non-US-ASCII data. But it is not sure, because "the bible" for mod_jk, 
as for Apache and as for Tomcat, are the RFCs.

We non-English speakers worldwide desperately need a new version of the 
HTTP protocol where the default would be Unicode/UTF-8, for everything.
But I do not see much happening right now in that direction.

Maybe a tip for your authentication issues :
If, in the AJP <Connector> on the Tomcat side, you set the attribute
then Tomcat will accept the user-id authenticated by Apache, as the 
user-id for Tomcat (mod_jk transmits it).
So if your user authentication mechanism works fine at the Apache level, 
and generates a user-id that is "acceptable" by Tomcat, this may be a 
solution for your issue.
I have no idea if this user-id, for Tomcat, can or cannot contain 
non-US-ASCII characters.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message