struts-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From J.Patterson Waltz III <li...@cerenit.com>
Subject Re: Character encoding problems after 1.1 to 1.2.4 upgrade
Date Thu, 06 Jan 2005 14:52:25 GMT

On 5 janv. 05, at 13:30, J. Patterson Waltz III wrote:

> in article 41DACC14.7080703@ink.org, Josh Cronemeyer at josh@ink.org 
> wrote
> on 4/01/05 18:02:
>
>> J. Patterson Waltz III wrote:
>>
>>> Merci Guillaume,
>>>
>>> I had actually seen the references to the Filter solution in the 
>>> comments of
>>> Struts bug 16191 in Bugzilla:
>>> http://issues.apache.org/bugzilla/show_bug.cgi?id=16191
>>>
>>> I will try that out and see if it improves my results.
>>>
>>> I remain perplexed at what changes between versions 1.1 and 1.2.4 of 
>>> Struts
>>> caused it to become susceptible to this problem. Any ideas on that?
>>>
>>> Patterson
>>>
>>> P.S. - I know how to view the headers of replies sent from the 
>>> server to the
>>> browser, but am not sure how to get at those sent from the browser 
>>> to the
>>> server, to make sure that they are indeed UTF-8. Any suggestions?
>>>
>>> in article 87llb9p5ig.fsf@meuh.mnc.ch, Guillaume Cottenceau at 
>>> gc@mnc.ch
>>> wrote on 4/01/05 16:40:
>>>
>>>> Most probably the browser is sending data in UTF-8 but doesn't say 
>>>> so with
>>>> charset= in the Content-Type header. If you're confident enough 
>>>> that the
>>>> browsers will send UTF-8 (which should be the case if they are 
>>>> encoded in
>>>> UTF-8 and you use accept-charset in the forms), you can use a 
>>>> filter which
>>>> forces the HTTP request to be seen as UTF-8 in input (for example
>>>> filters/SetCharacterEncodingFilter which is bundled with tomcat)[1].
>>>>
>>>>
>>>
>> One way woulb be to set up a proxy that your browser uses to connect 
>> to the
>> web.  I used to use web scarab.  
>> http://www.owasp.org/software/webscarab.html
>>
>> -josh
>>
> I still haven't figured out the solution to my problem, but I have 
> figured
> out one *cause* of it.
>
> After using web scarab at Josh's suggestion to eavesdrop on the
> conversations between my browser and app server, I've figured out what 
> has
> changed between Struts 1.1 and 1.2.4:
>
> Struts 1.1 in spite of the the <%@ page pageEncoding="UTF-8"
> contentType="text/html;charset=UTF-8" language="java" %> directive, 
> Struts
> was in fact sending back pages encoded in ISO-8859-1, and the 
> resulting form
> submissions were URL-encoded (and decoded) in the same format.
>
> Struts 1.2.4 the directive is now being followed, and pages are sent 
> out in
> UTF-8 encoding. However, for some reason, the form data is not being 
> decoded
> as UTF-8, but still as ISO-8859-1.
>
> This suggests to me that if I were to remove the contentType directive 
> in
> Struts 1.2.4, it would fall back to the default ISO-8859-1 encoding, 
> and all
> would work as before: except that my web application would then be
> constrained to using only characters representable in that encoding.
>

More results of my testing: I was indeed able to restore the proper 
encoding of characters submitted to my web application by removing both 
the @ page contentType directive *and* the corresponding <controller 
contentType="text/html; charset=UTF-8" /> element from the 
struts-confix.xml file. However, as predicted, this only enabled proper 
encoding of characters within the ISO-8859-1 range: double-byte 
languages such as Japanese were munged into a series of question marks.

I also determined that neither adding acceptCharset="UTF-8" nor 
enctype="application/x-www-form-urlencoded;charset=UTF-8" attributes to 
the <html:form> tags in my JSPs seemed to make any difference with the 
encoding problems. The resulting Content-Type header returned by the 
browser never includes the character set information. This is not 
particularly surprising, as the html 4.0 spec (see 
http://www.w3.org/TR/REC-html40/interact/forms.html#submit-format) 
specifies:
> the "get" method restricts form data set values to ASCII characters. 
> Only the "post" method (with enctype="multipart/form-data") is 
> specified to cover the entire [ISO10646] character set.

Adding enctype="multipart/form-data;charset=UTF-8" worked even less 
well however, as Struts did not appear able to interpret form data 
submitted in this format: it displayed validation errors saying that 
required form fields were missing (although they had been submitted 
with complete information and were visible in the response returned by 
the browser).

Now, I guess I'll just have to try using the character encoding filter 
Guillaume recommended.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@struts.apache.org
For additional commands, e-mail: user-help@struts.apache.org


Mime
View raw message