tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Remy Maucherat <r...@apache.org>
Subject Re: [PATCH] Bug 22666
Date Sun, 07 Sep 2003 06:00:26 GMT
Mark Thomas wrote:

> This is obviously a bigger mess than I first thought. As I see it, the 
> following options exist for resolving bug 22666.
> 
> 1. WONTFIX - On the basis that there is too much uncertainty to do anything 
> sensible and that any changes made might break interoperability as per Remy's 
> point 3 below.
> 
> 2. FIX - Patch the parameter class (as per Remy's point 2 below) on the grounds 
> that the JSP spec states "The World Wide Web Consortium (http://www.w3.org/) is 
> a definitive source of HTTP related information affecting this specification 
> and its implementations." and the w3c view 
> (http://www.w3.org/International/O-URL-code.html) is that URI encoding should 
> always be based on UTF-8. However, this is still likely to break things (back 
> to Remy's point 3).
> 
> 3. FIX - Add a configuration option that enables w3c compliant URI decoding and 
> patch the parameter and any other relevant classes to support this option. I am 
> not 100% sure where the best place to do this would be. I am leaning towards 
> adding it to the context as an optional parameter with a default state of 
> disabled.
> 
> There are several bugs in bugzilla that look as if they are on similar lines 
> and on that basis my own view is that option 3 is way to go. Before I start 
> coding, I would be grateful for some feedback/guidance on my planned approach.

I'll vote almost 2 ;-) No client I know of is always cosistently using 
UTF8 to encode the URL, but however, I'm not sure clients are using the 
encoding of the entity body to encode the URL.
Proper character decoding of the decoded (it means %xx decoded here) URL 
is already done (see CoyoteAdapter.convertURI), and there's a 
connector.getURIEncoding() which is available to indicate what encoding 
is to be used for the URL. Note: The default is US-ASCII (because 
something else doesn't work), but you can be compliant with the W3C and 
use UTF8 :) For more flexibility, we can use a new connector field for 
that (let's call it connector.getQueryStringEncoding()), or use 
connector.getURIEncoding(). This would be passed to the Parameters class 
and used exclusively for the query string decoding (the POSTed stuff 
won't use it, obviously). I want (I have to insist ;-) ) the default be 
US-ASCII (so the feature will work in the real world) with a quick and 
dirty B2C conversion in that particular case (like 
CoyoteAdapter.convertURI).

Overall, this looks the most reasonable and flexible.

Note: If you want to code it, you'd better do it really fast ;-)

Remy



Mime
View raw message