tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hans Bergsten <h...@gefionsoftware.com>
Subject Re: Justification for URIEncoding addition?
Date Fri, 21 Nov 2003 22:20:03 GMT
Larry Isaacs wrote:
> Hans,
> 
> The behavior change is unrelated to the use of getParameter()
> to search for "jsp_precompile".  Both Tomcat 3.x and Tomcat 4.x
> were bit by this log ago and Craig's fix was applied to both.
> In Tomcat 4's case, it was prior to the 4.0 release.

Okay, I'm sure you're right that there may be more to it than
avoiding the getParameter() call in Jasper, but based on what
I've read, it seems to be part of the problem at least.

> Assuming I have a good grip on the issue, I think it relates
> to using UTF-8 to decode the path portion of the URL which
> gets used to determine context, servlet mapping, etc.  Then
> allowing setCharacterEncoding() to change the character encoding
> for the query portion of the same URL.  The Servlet 2.3 and 2.4
> specs both say setCharacterEncoding() applies to the request body
> but don't mention it applying to the query portion of the URL.

Right, but since the servlet spec doesn't say anything about encoding
for the query portion, I think we have some room for a sensible
interpretation.

My concern is that with the new decoding behavior, apps that used to
work fine suddenly don't, and the reason seems to be that browsers
in fact ignore the RFC2718 recommendation that TC now enforces. I'm
all for compliance with all related specs, but in this case it's just
a recommendation and following it seems to do more harm than good.

I agree it's not as clean as you may want, but are there any real
problems with decoding the path portion using one charset and the
query string with another (i.e., the one from getCharacterEncoding()),
the way it used to be done?

Hans

>>-----Original Message-----
>>From: Hans Bergsten [mailto:hans@gefionsoftware.com] 
>>Sent: Friday, November 21, 2003 1:55 PM
>>To: Tomcat Developers List
>>Subject: Re: Justification for URIEncoding addition?
>>
>>
>>Remy Maucherat wrote:
>>
>>>Larry Isaacs wrote:
>>>
>>>
>>>>Hi Remy,
>>>>
>>>>Okay, re-reviewed the original 22666 thread.  To complete 
>>
>>this thread,
>>
>>>>I'll assume the following from RFC2718 is our justification for the
>>>>new behavior:
>>>>
>>>>      Unless there is some compelling reason for a
>>>>      particular scheme to do otherwise, translating character
>>>>      sequences into UTF-8 (RFC 2279) [3] and then subsequently
>>>>      using the %HH encoding for unsafe octets is recommended.
>>>>
>>>>Tomcat will default to US-ASCII instead of UTF-8 so it won't break
>>>>too many existing webapps.  If there are other parts to this story,
>>>>I would be interested in learning of them.
>>>>
>>>>I'm still concerned that this makes Tomcat less useful by creating
>>>>deployment problems for webapps that aren't technically broken.
>>>>However, these issues were covered in the prior e-mail thread
>>>>
>>
>>(http://www.mail-archive.com/tomcat-dev@jakarta.apache.org/msg
>>46479.html), 
>>
>>>>so I'll drop the issue.  Thanks.
>>>
>>>
>>>The idea for the change is that there's no compelling 
>>
>>reason (except 
>>
>>>hacking) to have one part of the URI be in some encoding 
>>
>>(US-ASCII or 
>>
>>>UTF-8, if you want to have any chance of mapping it 
>>
>>successfully), and 
>>
>>>the rest encoded in something else.
>>>
>>>There's indeed a bug thread on this issue, and I was on 
>>
>>your side at first.
>>
>>I've browsed through the thread referenced above as well as 
>>the comments
>>on bug 22666. Sorry if I'm missing something here, but to me it seems
>>like what Craig did for TC 4.x is the solution that's less harmful,
>>namely let Jasper get the "jsp_precompile" parameter by scanning the
>>getQueryString() result instead of using getParameter().
>>
>>It's clear that enforcing the RFC2718 recommendation breaks a lot of
>>apps (based on all the bug reports and questions about this), 
>>and AFAIK,
>>most commonly used browsers (or all of them) use the encoding of the
>>page to encode parameters in both the body and the query string. It
>>therefore seems reasonable to use the setCharacterEncoding() value to
>>decode both types of parameters (at least as a default) and fix 22666
>>by avoiding the premature call to getParameter() that Jasper does in
>>the same way as it's done in TC 4.
>>
>>My applogies if I missed a part of the thread that discussed this
>>solution and found it flawed.
>>
>>Hans

-- 
Hans Bergsten                                <hans@gefionsoftware.com>
Gefion Software                       <http://www.gefionsoftware.com/>
Author of O'Reilly's "JavaServer Pages", covering JSP 2.0 and JSTL 1.1
Details at                                    <http://TheJSPBook.com/>


---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org


Mime
View raw message