tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Juszczec <>
Subject Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem
Date Wed, 19 Oct 2016 18:42:46 GMT
On Tue, Oct 18, 2016 at 4:45 PM, Mark Juszczec <>

> On Tue, Oct 18, 2016 at 2:58 PM, Mark Juszczec <>
> wrote:
>> Some questions (if these are not relevant, please disregard):
>> I'm loading a whole bunch of modules.  Could some of them be incompatible?
>> DocumentRoot refers to a directory that does not exist.  Is that a
>> problem?
>> What does AddLanguage do?
>> Is AddDefaultCharset redundant?
>> Are +ForwardKeySize and -ForwardDirectories somehow disabling what
>> +ForwardURIEscaped does?
>> I have verified the data coming out of Shibboleth is what we expect.
> I think I've found where the byte data is coming in.
>'s method:
> protected boolean read(byte[] buf, int pos, int n, boolean block) throws
> IOException
> This ultimately gives me a great big buffer of bytes. Spring Tool Suite
> shows me the relevant ones:
> 74 79 -61 -117 76
I think I have found where these bytes are interpreted improperly and my
problems start.

In there is a method named  protected void

        // Decode extra attributes
        boolean secret = false;
        byte attributeCode;
        while ((attributeCode = requestHeaderMessage.getByte())
                != Constants.SC_A_ARE_DONE) {

            switch (attributeCode) {

            case Constants.SC_A_REQ_ATTRIBUTE :
                String n = tmpMB.toString();
                String v = tmpMB.toString();

I have debugged and gotten to the point where n="FirstName" - the bit of
data giving me fits

After  requestHeaderMessage.getBytes(tmpMB); (the one after String n =
....) tmpMB shows "JOËL"

tmpMB is a MessageByte.  It contains a ByteChunk.which is the array of
bytes I posted yesterday.

The ByteChunk has a start=1049 and an end=1054.  That is bytes

1049: 5
1050: 74        J
1051: 79        O
1052: -61        0xF....C3
1053: -117      0xF....8B
1054: 76       L

The ByteChunk has a charset and it is set to ISO-8859-1

So, that explains - at least to me - where things go wrong.

Now, the question is why.

Looking at, I see it has the following:

    /** Default encoding used to convert to strings. It should be UTF8,
        as most standards seem to converge, but the servlet API requires
        8859_1, and this object is used mostly for servlets.
    public static final Charset DEFAULT_CHARSET =

    private Charset charset;

    public void setCharset(Charset charset) {
        this.charset = charset;

    public Charset getCharset() {
        if (charset == null) {
            charset = DEFAULT_CHARSET;
        return charset;

I set a breakpoint on ByteChunk.setCharset(Charset) and it is never

ByteChunk.getCharset() is called from MessageBytes.toBytes() which is
called from AjpMessage.appendBytes(MessageBytes)

So, I think this explains why my data is being interpreted incorrectly.

Now, the question becomes why isn't this line in server.xml:

 <Connector port="XXXX"
                  connectionTimeout="600000" />

enough to cause ByteChunk.charset to be set to "UTF-8"

Does anyone have any thoughts as to how to proceed?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message