tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier (tomcat) ...@ice-sa.com>
Subject Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem
Date Thu, 20 Oct 2016 08:21:30 GMT
On 19.10.2016 20:42, Mark Juszczec wrote:
> On Tue, Oct 18, 2016 at 4:45 PM, Mark Juszczec <mark.juszczec@gmail.com>
> wrote:
>
>>
>>
>> On Tue, Oct 18, 2016 at 2:58 PM, Mark Juszczec <mark.juszczec@gmail.com>
>> wrote:
>>>
>>>
>>> Some questions (if these are not relevant, please disregard):
>>>
>>> I'm loading a whole bunch of modules.  Could some of them be incompatible?
>>>
>>> DocumentRoot refers to a directory that does not exist.  Is that a
>>> problem?
>>>
>>> What does AddLanguage do?
>>>
>>> Is AddDefaultCharset redundant?
>>>
>>> Are +ForwardKeySize and -ForwardDirectories somehow disabling what
>>> +ForwardURIEscaped does?
>>>
>>> I have verified the data coming out of Shibboleth is what we expect.
>>>
>>
>> I think I've found where the byte data is coming in.
>>
>> AjpAprProcessor.java's method:
>>
>> protected boolean read(byte[] buf, int pos, int n, boolean block) throws
>> IOException
>>
>> This ultimately gives me a great big buffer of bytes. Spring Tool Suite
>> shows me the relevant ones:
>>
>> 74 79 -61 -117 76
>>
>>
> I think I have found where these bytes are interpreted improperly and my
> problems start.
>
> In AbstractAjpProcessor.java there is a method named  protected void
> prepareRequest()
>
>          // Decode extra attributes
>          boolean secret = false;
>          byte attributeCode;
>          while ((attributeCode = requestHeaderMessage.getByte())
>                  != Constants.SC_A_ARE_DONE) {
>
>              switch (attributeCode) {
>
>              case Constants.SC_A_REQ_ATTRIBUTE :
>                  requestHeaderMessage.getBytes(tmpMB);
>                  String n = tmpMB.toString();
>                  requestHeaderMessage.getBytes(tmpMB);
>                  String v = tmpMB.toString();
>
> I have debugged and gotten to the point where n="FirstName" - the bit of
> data giving me fits
>
> After  requestHeaderMessage.getBytes(tmpMB); (the one after String n =
> ....) tmpMB shows "JOËL"
>
> tmpMB is a MessageByte.  It contains a ByteChunk.which is the array of
> bytes I posted yesterday.
>
> The ByteChunk has a start=1049 and an end=1054.  That is bytes
>
> 1049: 5
> 1050: 74        J
> 1051: 79        O
> 1052: -61        0xF....C3
> 1053: -117      0xF....8B
> 1054: 76       L
>
> The ByteChunk has a charset and it is set to ISO-8859-1
>
> So, that explains - at least to me - where things go wrong.
>
> Now, the question is why.
>
> Looking at ByteChunk.java, I see it has the following:
>
>      /** Default encoding used to convert to strings. It should be UTF8,
>          as most standards seem to converge, but the servlet API requires
>          8859_1, and this object is used mostly for servlets.
>      */
>      public static final Charset DEFAULT_CHARSET =
> StandardCharsets.ISO_8859_1;
>
>      private Charset charset;
>
>      public void setCharset(Charset charset) {
>          this.charset = charset;
>      }
>
>      public Charset getCharset() {
>          if (charset == null) {
>              charset = DEFAULT_CHARSET;
>          }
>          return charset;
>      }
>
> I set a breakpoint on ByteChunk.setCharset(Charset) and it is never
> executed.
>
> ByteChunk.getCharset() is called from MessageBytes.toBytes() which is
> called from AjpMessage.appendBytes(MessageBytes)
>
> So, I think this explains why my data is being interpreted incorrectly.
>
> Now, the question becomes why isn't this line in server.xml:
>
>   <Connector port="XXXX"
>                    emptySessionPath="true"
>                    enableLookups="false"
>                    redirectPort="YYYY"
>                    protocol="AJP/1.3"
>                    maxThreads="300"
>                    URIEncoding="UTF-8"
>                    connectionTimeout="600000" />
>
> enough to cause ByteChunk.charset to be set to "UTF-8"
>
> Does anyone have any thoughts as to how to proceed?
>

Can you tell us (or remind us) exactly how the browser is sending this request for the 
parameter "JOEL" (with dieraesis on the E) to the server ?
Is it a part of the query-string of the URL, or is it in the body of a POST request ?

The following on-line documentation describes precisely how this should work :
http://tomcat.apache.org/tomcat-8.0-doc/config/ajp.html#Attributes
(See "URIEncoding", but also "useBodyEncodingForURI", and follow the link provided to the

same attributes in the HTTP Connector : 
http://tomcat.apache.org/tomcat-8.0-doc/config/http.html#Common_Attributes)

So check exactly what you are doing, and if that matches these rules somehow.

Personal rant :
Unfortunately, this is is still a big mess in the HTTP protocol.
And the people in charge of the design of the protocol missed a golden opportunity of 
cleaning this up in HTTP 2.x and making Unicode/UTF-8 the default, instead of clinging to

iso-8859-1. Thus condemning all web programmers worldwide to another 20 years of obscure 
bugs and clunky work-arounds.

(s) Andr%C3%A9




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message