tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Schultz <ch...@christopherschultz.net>
Subject Re: mod_jk codepage in header values
Date Mon, 01 Feb 2010 14:57:34 GMT
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Rainer,

On 1/30/2010 7:56 AM, Rainer Jung wrote:
> So I expect you can forward any binary garbage you like, as long as you
> make sure the code putting it into the environment variables doesn't
> already do any encoding or decoding.

This was pretty much just as I expected.

> Now: it seems that Tomcat is by default assuming it needs to transform
> the binary AJP data stream for request attributes into ISO-8859-1
> decoded Java strings. I'm not 100% sure here, but this is the likely the
> most important part of the game.

It looks like AprProtocol.java, in prepareRequest, handles request
attributes in the SC_A_REQ_ATTRIBUTE case. No encoding/decoding is done
there. Instead, it is done by the MessageBytes class, indirectly by the
ByteChunk class.

The documentation for ByteChunk says:

 * In a server it is very important to be able to operate on
 * the original byte[] without converting everything to chars.
 * Some protocols are ASCII only, and some allow different
 * non-UNICODE encodings. The encoding is not known beforehand,
 * and can even change during the execution of the protocol.
 * ( for example a multipart message may have parts with different
 *  encoding )
 *
 * For HTTP it is not very clear how the encoding of RequestURI
 * and mime values can be determined, but it is a great advantage
 * to be able to parse the request without converting to string.

Later:

    /** Default encoding used to convert to strings. It should be UTF8,
        as most standards seem to converge, but the servlet API requires
        8859_1, and this object is used mostly for servlets.
    */
    public static final String DEFAULT_CHARACTER_ENCODING="ISO-8859-1";

If ByteChunk.setEncoding has not been called, this default encoding is
used to decode bytes. Unfortunately, setEncoding is not static, so you
have to have a reference to the ByteChunk object in order to fix it.

Then again, knowing that ISO-8859-1 is being used may make it easier to
write a transcoder...

new String(myString.getBytes("ISO-8859-1"), "UTF-8")

That's ugly and I feel like it's asking for problems, but it might be
your only reasonable recourse.

- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAktm694ACgkQ9CaO5/Lv0PDKKwCeIq2PqcF3DNyrqgw7JKh84kYf
nFwAoJwBlivosSo4e95nhQTLZoxYs2Be
=ePve
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message