tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Remy Maucherat <>
Subject Re: Justification for URIEncoding addition?
Date Mon, 24 Nov 2003 13:40:51 GMT
Hans Bergsten wrote:

> Larry Isaacs wrote:
>> Hans,
>> The behavior change is unrelated to the use of getParameter()
>> to search for "jsp_precompile".  Both Tomcat 3.x and Tomcat 4.x
>> were bit by this log ago and Craig's fix was applied to both.
>> In Tomcat 4's case, it was prior to the 4.0 release.
> Okay, I'm sure you're right that there may be more to it than
> avoiding the getParameter() call in Jasper, but based on what
> I've read, it seems to be part of the problem at least.
>> Assuming I have a good grip on the issue, I think it relates
>> to using UTF-8 to decode the path portion of the URL which
>> gets used to determine context, servlet mapping, etc.  Then
>> allowing setCharacterEncoding() to change the character encoding
>> for the query portion of the same URL.  The Servlet 2.3 and 2.4
>> specs both say setCharacterEncoding() applies to the request body
>> but don't mention it applying to the query portion of the URL.
> Right, but since the servlet spec doesn't say anything about encoding
> for the query portion, I think we have some room for a sensible
> interpretation.
> My concern is that with the new decoding behavior, apps that used to
> work fine suddenly don't, and the reason seems to be that browsers
> in fact ignore the RFC2718 recommendation that TC now enforces. I'm
> all for compliance with all related specs, but in this case it's just
> a recommendation and following it seems to do more harm than good.
> I agree it's not as clean as you may want, but are there any real
> problems with decoding the path portion using one charset and the
> query string with another (i.e., the one from getCharacterEncoding()),
> the way it used to be done?

I see you as a member of the expert group for the servlet spec. Did you 
make out those points during the review period ? If not, then you IMO 
have nothing to complain about, esp since Tomcat implements a far more 
reasonable and simpler behavior for the URL string handling.

The specification should have specified something along the lines of:
- The URL must be %xx encoded
- This decodes to bytes reprensenting UTF-8 characters
There's an IETF standard that, I think, states this in B&W. It is being 
ignored. Maybe this wouldn't be the case if very popular tech, such as 
servlets & JSPs, started mandating it ? This is simply a chiken & egg issue.

i18n issues with HTTP and srevlets have been known about for years, but 
unfortunately they still haven't been addressed properly.
Same with the request dispatcher + wrapping issues that I have pointed 
out months ago (and of course, were silently ignored).

To balance this a little, among the other big issues, I have to give 
credit for solving the welcome files in a satisfactory way, as well as 
filters with RDs. Filters now make the proprietary APIs provided by the 
container irrelevant for most tasks.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message