struts-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Evans (JIRA)" <j...@apache.org>
Subject [jira] Reopened: (STR-1941) double UTF-8 encoding of HTTP request parameters
Date Mon, 24 Apr 2006 20:32:09 GMT
     [ http://issues.apache.org/struts/browse/STR-1941?page=all ]
     
David Evans reopened STR-1941:
------------------------------

    Assign To: David Evans  (was: Struts Developer Mailing List)

> double UTF-8 encoding of HTTP request parameters
> ------------------------------------------------
>
>          Key: STR-1941
>          URL: http://issues.apache.org/struts/browse/STR-1941
>      Project: Struts Action 1
>         Type: Bug

>   Components: Action
>     Versions: Nightly Build
>  Environment: Operating System: other
> Platform: Other
>     Reporter: Akos Maroy
>     Assignee: David Evans

>
> I'm having a problem with properly processing UTF-8 encoded request parameters
> through struts. The effect is, that international characters (that are not
> ASCII, thus are multi-byte UTF-8 characters) are encoded twice into UTF-8.
> As an example, let's see the examples webapp included in the jakarta-struts
> source tree. It has the registration sample, reachable through
> http://localhost:8080/struts-examples/validator/registration.do
> if installed on localhost:8080. let's suppose I which to type:
> small letter a with acute: á
> unicode value hex:         00e1
> unicode value binary:      11100001
> UTF-8 binary:              11000011 10100001
> UTF-8 in hex:              c3a1
> into the firstName field into the form. this can be simulated by:
> http://localhost:8080/struts-examples/validator/registration-submit.do?firstName=%C3%A1
> (if typed manually and submitted via POST, has the same effect)
> the resuling page shows a lot of form problems, as I didn't fill out most of the
> fields, which is OK. but more importantly, it also shows the entered letter in
> the firstName input field. what is vierd, is that a different letter is shown
> (actually two letters). running xxd on the received page, here's the relevant part:
> 00003a0: 6e67 7468 3d22 3330 2220 7369 7a65 3d22  ngth="30" size="
> 00003b0: 3330 2220 7661 6c75 653d 22c3 83c2 a122  30" value="...."
> 00003c0: 3e0a 2020 2020 3c2f 7464 3e0a 2020 3c2f  >.    </td>.  </
> with the important part at value="....", which is:
> 00003b0: 3330 2220 7661 6c75 653d 22c3 83c2 a122  30" value="...."
>                                     ^^^^^^^^^^
> the letters presented are:
> UTF-8 hex sequence: c383c2a1
> UTF-8 binary:       11000011 10000011 11000010 10100001
> which is actually two UTF-8 letters by now. what is funny, that if I 'decode'
> them from UTF-8, I get the original UTF-8 sequence:
> first part, as received: 11000011 10000011
> de-coded:                11000011
> second part, as received: 11000010 10100001
> de-coded:                 10100001
> and voila, the the parts make up the original UTF-8 sequence:
> 11000011 10100001
> which actually is the UTF-8 sequence for the letter sent.
> if I resend this page (the by now to UTF-8 letters), I get four letters, then 8,
> etc. it seems, that the engine doesn't recognize, that there are UTF-8 sequences
> to begin with, and encodes them 'again'.
> I'm using mozilla as a browser, Tomcat 5.0.16. the encoding of the pages is UTF-8.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/struts/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@struts.apache.org
For additional commands, e-mail: dev-help@struts.apache.org


Mime
View raw message