tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Juszczec <mark.juszc...@gmail.com>
Subject Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem
Date Mon, 17 Oct 2016 20:38:16 GMT
On Mon, Oct 17, 2016 at 8:20 AM, Rainer Jung <rainer.jung@kippdata.de>
wrote:

> Am 17.10.2016 um 12:35 schrieb Mark Juszczec:
>
>> On Mon, Oct 17, 2016 at 4:29 AM, Mark Thomas <markt@apache.org> wrote:
>>
>>
>>> A small hint. I'd expect those to be % encoded.
>>>
>>>
>> Thank you very much for your reply.
>>
>> I've been thinking the problem is lack of % encoding after reading:
>>
>> *"Default encoding for GET*
>> The character set for HTTP query strings (that's the technical term for
>> 'GET parameters') can be found in sections 2 and 2.1 the "URI Syntax"
>> specification. The character set is defined to be US-ASCII
>> <http://en.wikipedia.org/wiki/ASCII>. Any character that does not map to
>> US-ASCII must be encoded in some way. Section 2.1 of the URI Syntax
>> specification says that characters outside of US-ASCII must be encoded
>> using
>>  % escape sequences: each character is encoded as a literal % followed by
>> the two hexadecimal codes which indicate its character code. Thus, a
>> (US-ASCII
>> character code 97 = 0x61) is equivalent to %61. There *is no default
>> encoding for URIs* specified anywhere, which is why there is a lot of
>> confusion when it comes to decoding these values. "
>>
>> from http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q8
>>
>> Do you know if there's a way to force something (mod_jk, mod_rewrite or
>> something else) to % encode the data being fed into the AJP port?
>>
>
> You can force nod_jk to %-encode the URI before forwarding:
>
> JkOptions     +ForwardURIEscaped
>
>
I've tried adding +ForwardURIEscaped in my conf file as follows:

# JkOptions indicate to send SSL KEY SIZE,
JkOptions +ForwardKeySize +ForwardURIEscaped -ForwardDirectories

I would have expected mod_jk log to show the data % encoded, but it does
not:

text: J O Ë ‹ L
hex: 0x4a 0x4f 0xc3 0x8b 0x4c

I had expected to see something like:

JO%C3%8BL

Is that reasonable?  Does it make sense?

Could something be turning off the encoding?  Do the headers values need to
be set to something specific?

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message