tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Juszczec <>
Subject Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem
Date Thu, 20 Oct 2016 13:55:15 GMT
On Thu, Oct 20, 2016 at 4:21 AM, André Warnier (tomcat) <>

> Can you tell us (or remind us) exactly how the browser is sending this
> request for the parameter "JOEL" (with dieraesis on the E) to the server ?
> Is it a part of the query-string of the URL, or is it in the body of a
> POST request ?
> The following on-line documentation describes precisely how this should
> work :
> (See "URIEncoding", but also "useBodyEncodingForURI", and follow the link
> provided to the same attributes in the HTTP Connector :
> )
> So check exactly what you are doing, and if that matches these rules
> somehow.
> Personal rant :
> Unfortunately, this is is still a big mess in the HTTP protocol.
> And the people in charge of the design of the protocol missed a golden
> opportunity of cleaning this up in HTTP 2.x and making Unicode/UTF-8 the
> default, instead of clinging to iso-8859-1. Thus condemning all web
> programmers worldwide to another 20 years of obscure bugs and clunky
> work-arounds.
> (s) Andr%C3%A9
The data is being returned by Shibboleth and passed to Tomcat in the body
of an HTTP GET request.

This is by design of the application and there's nothing I can do about it.

As such, my only options for enforcing UTF-8 are by using "URIEncoding"
and/or "useBodyEncodingForURI" as described in the links.

I've done this and it has not had any impact on the problem.

Last night, I found these bits of information:

My interpretation (and PLEASE tell me if I'm wrong) is, since at least
2007, headers have been locked in to the ISO-8859-1 charset due to specs
that govern how the world wide web is going to work.


goes on to reiterate what the first link says and propose a workaround (see
the Java link at the end of the page)

"Shibboleth attributes are by default UTF-8 encoded. However, depending on
the servlet contaner configuration they are interpreted as ISO-8859-1
values. This causes problems with non-ASCII characters. The solution is to
re-encode attributes, e.g. with:

String value= request.getHeader("givenName");
value= new String( value.getBytes("ISO-8859-1"), "UTF-8");"

Although MY data is delivered as attributes (so I have to use
request.getAttribute("FirstName") )  this works

ISO-8859-1 is the default used by ByteChunk and I've verified it is not
reset/changed to UTF-8 despite having specified it in server.xml per Tomcat

I found this:

which says this problem has been around since at least 2007

Then I found this:

which suggests the following solution:

String value= request.getHeader("givenName");
value= new String( value.getBytes("ISO-8859-1"), "UTF-8");

I have to get my data via request.getAttribute("key")

Is the solution appropriate for data delivered as attributes?
I have read the information that says its a dangerous hack and is the main
reason I have not implemented it.

However, given the Shibboleth forum posts and what I've discovered about
ByteChunk seems to cast this in a different light.

Any thoughts, comments would be greatly appreciated.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message