tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier (tomcat) ...@ice-sa.com>
Subject Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem
Date Thu, 20 Oct 2016 18:10:23 GMT
On 20.10.2016 15:55, Mark Juszczec wrote:
> On Thu, Oct 20, 2016 at 4:21 AM, André Warnier (tomcat) <aw@ice-sa.com>
> wrote:
>
>>
>> Can you tell us (or remind us) exactly how the browser is sending this
>> request for the parameter "JOEL" (with dieraesis on the E) to the server ?
>> Is it a part of the query-string of the URL, or is it in the body of a
>> POST request ?
>>
>> The following on-line documentation describes precisely how this should
>> work :
>> http://tomcat.apache.org/tomcat-8.0-doc/config/ajp.html#Attributes
>> (See "URIEncoding", but also "useBodyEncodingForURI", and follow the link
>> provided to the same attributes in the HTTP Connector :
>> http://tomcat.apache.org/tomcat-8.0-doc/config/http.html#Common_Attributes
>> )
>>
>> So check exactly what you are doing, and if that matches these rules
>> somehow.
>>
>> Personal rant :
>> Unfortunately, this is is still a big mess in the HTTP protocol.
>> And the people in charge of the design of the protocol missed a golden
>> opportunity of cleaning this up in HTTP 2.x and making Unicode/UTF-8 the
>> default, instead of clinging to iso-8859-1. Thus condemning all web
>> programmers worldwide to another 20 years of obscure bugs and clunky
>> work-arounds.
>>
>> (s) Andr%C3%A9
>>
>>
> The data is being returned by Shibboleth and passed to Tomcat in the body
> of an HTTP GET request.

Nitpick : that is a contradiction in terms. A GET request, per RFC, has no "body".
See : https://tools.ietf.org/html/rfc7231#section-4  4.3.1 GET

I don't know Shibboleth, and I do not know how it works exactly, but based on what you 
seem to imply here, I will assume that the "joel" in question is being passed as part of 
the GET request URL (like "..?givenName=joel&otherparam=xxx..").
(Technically, that part is the "query-string" part of the URI).

Based on what else you indicate below about Shibbolet, I would also assume that the "e 
with dieresis" (sorry, can't type it on my German keyboard), is passed in that 
query-string, as iso-8859-1, perhaps percent-encoded as %CB or %EB.

Receiving this, recent Tomcats would decode this either as iso-8859-1 (latin-1) (if 
STRICT_SERVLET_COMPLIANCE is enforced), or as UTF-8 (by default), or according to what you

set as "URIEncoding" and/or "useBodyEncodingForURI".
If it tries UTF-8, that may or may not generate a valid Java Unicode character, but it 
would in any case not be the character that you expect.
If you set it to decode the URIs using iso-8859-1, then it would decode this correctly 
(and generate the correct java Unicode character in your application), but it would decode

*all* further request URIs using iso-8859-1, which would most probably have adverse 
effects on the rest of your application.

So it would seem that you are stuck somewhere in-between.
But it is not a Tomcat issue, it is a Shibbolet issue.
(Or rather, a Shibbolet-and-HTTP-defaulting-to-iso-8859-1 issue).

>
> This is by design of the application and there's nothing I can do about it.
>

Neither can we.

> As such, my only options for enforcing UTF-8 are by using "URIEncoding"
> and/or "useBodyEncodingForURI" as described in the links.
>
> I've done this and it has not had any impact on the problem.
>
> Last night, I found these bits of information:
>
> https://issues.shibboleth.net/jira/browse/SSPCPP-2
>
> My interpretation (and PLEASE tell me if I'm wrong) is, since at least
> 2007, headers have been locked in to the ISO-8859-1 charset due to specs
> that govern how the world wide web is going to work.
>

Well yes, see my previous rant.
See : https://tools.ietf.org/html/rfc7230#section-3.2
3.2.4.  Field Parsing (at the end)

> This:
>
> https://wiki.shibboleth.net/confluence/display/SHIB2/NativeSPAttributeAccess

I am sorry, but I do not really have the time right now (nor the setup) to investigate 
further into what Shibbolet is doing, or what they are really explaining in that article.
But while reading this "in diagonal", I have a suspicion that maybe the following may help

you, in the case of a mod_jk Connector to Tomcat :

http://tomcat.apache.org/connectors-doc/reference/apache.html

JkEnvVar	

"Adds a name and an optional default value of environment variable that should be sent to

servlet-engine as a request attribute. If the default value is not given explicitly, the 
variable will only be send, if it is set during runtime.
The default is empty, so no additional variables will be sent.
This directive can be used multiple times per virtual server. The settings will be merged

between the global server and any virtual server.
You can retrieve the variables on Tomcat as request attributes via 
request.getAttribute(attributeName). Note that the variables send via JkEnvVar will not be

listed in request.getAttributeNames().
Empty default values are supported since version 1.2.20. Not sending variables with empty

defaults and empty runtime value has been introduced in version 1.2.21. "

In other words : if Shibbolet can send this value in the form of a HTTP header, and you 
can configure the Apache httpd front-end to pick up the value of that header and set it 
into an "Apache environment variable" (perhaps with mod_rewrite and a RewriteRule)), then

you could ask mod_jk to forward this variable content to Tomcat, as a request attribute.
(and thus pick it up with request.getAttribute(), and perhaps in the correct encoding)

A lot of speculation here..

(And maybe by the above, I am just duplicating what Shibbolet already does by itself)


>
> goes on to reiterate what the first link says and propose a workaround (see
> the Java link at the end of the page)
>
> "Shibboleth attributes are by default UTF-8 encoded. However, depending on
> the servlet contaner configuration they are interpreted as ISO-8859-1
> values. This causes problems with non-ASCII characters. The solution is to
> re-encode attributes, e.g. with:
>
> String value= request.getHeader("givenName");
> value= new String( value.getBytes("ISO-8859-1"), "UTF-8");"
>
>
> Although MY data is delivered as attributes (so I have to use
> request.getAttribute("FirstName") )  this works
>
> ISO-8859-1 is the default used by ByteChunk and I've verified it is not
> reset/changed to UTF-8 despite having specified it in server.xml per Tomcat
> documentation.
>
> I found this:
>
> https://issues.shibboleth.net/jira/browse/SSPCPP-2
>
> which says this problem has been around since at least 2007
>
> Then I found this:
>
> https://wiki.shibboleth.net/confluence/plugins/servlet/mobil
> e#content/view/4358180
>
> which suggests the following solution:
>
> String value= request.getHeader("givenName");
> value= new String( value.getBytes("ISO-8859-1"), "UTF-8");
>
> I have to get my data via request.getAttribute("key")
>
> Is the solution appropriate for data delivered as attributes?
> I have read the information that says its a dangerous hack and is the main
> reason I have not implemented it.
>
> However, given the Shibboleth forum posts and what I've discovered about
> ByteChunk seems to cast this in a different light.
>
> Any thoughts, comments would be greatly appreciated.
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message