hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 12798] - Path should not be encoded in HttpMethodBase
Date Fri, 07 Mar 2003 23:13:28 GMT
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=12798>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=12798

Path should not be encoded in HttpMethodBase





------- Additional Comments From adrian@ephox.com  2003-03-07 23:13 -------
This patch doesn't look right to me. I'm no expert but have just recently had 
to review our URL encoding code so it concerns me that we don't seem to be 
encoding the query string.  RFC 1738 (Uniform Resource Locators), specifies 
that:

   Octets must be encoded if they have no corresponding graphic
   character within the US-ASCII coded character set, if the use of the
   corresponding character is unsafe, or if the corresponding character
   is reserved for some other interpretation within the particular URL
   scheme.

The unsafe characters listed in the rfc are:
"{", "}", "|", "\", "^", "~",
   "[", "]", "`", "<", ">", """, "#", "%"

in addition the reserved characters are:
";", "/", "?", ":", "@", "=" and "&"

It then adds:
   Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
   reserved characters used for their reserved purposes may be used
   unencoded within a URL.

   On the other hand, characters that are not required to be encoded
   (including alphanumerics) may be encoded within the scheme-specific
   part of a URL, as long as they are not being used for a reserved
   purpose.

Now, what this implies to me is that the process for encoding any given URL is:

1. Break the URL into it's various parts, for HTTP this would be:
http://<host>:<port>/<path>?<searchpart>

2. Take each part of the URL and encode it (though one would hope that a host 
name contains only US-ASCII characters or the DNS system is going to have 
trouble with it anyway).

3. Reassemble the URL.

Now, I'm somewhat unsure as to whether the URL we are given is encoded or not 
and the JavaDocs for the methods do not specify this.  So the first action 
item of this bug must be to decide whether methods should be passed an encoded 
or an unencoded URL and document it.

IF we decide that URLs passed into the methods should be encoded, then we need 
to stop encoding the path, on the other hand, IF we decide that URLs passed 
into the methods should be unencoded, then we need to encode the query string 
as well.

Also, if all URLs are being passed in encoded, then we should have no need for 
URL encoding functionality as we should only ever use encoded URLs.

My suggestion would be to only ever work with encoded URLs, but then do one of 
the following:

1. add a new constructor to each of the methods which takes a boolean to 
determine whether the URL is encoded or not.  If not we encode it before 
passing it through to anywhere else.

2. provide the URIUtils class (possibly as a separate project) to allow the 
user to easily encode URLs.  We should ensure that there is a method in 
URIUtils that can take a full URL with non displayable US-ASCII characters and 
unsafe characters (but no extra reserved characters) and encode it correctly.  
This prevents the user having to break up the URL to encode it.

Mime
View raw message