httpd-modules-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thibaut VARENE" <T-B...@parisc-linux.org>
Subject Re: help with ap_escape_uri()
Date Sat, 05 May 2007 18:18:03 GMT
On 5/5/07, David Wortham <djwortham@gmail.com> wrote:
> Thiabut,
>    As far as I know, URI escaping functions escape all non-alpha numberics
> which are not in the following set of characters: {'-', '_', ':', '/', '?',
> '=', '&', '#', '.'} (there may be others I can't think of right now).  If a
> character is in that set of characters, the URI remains "legal" even if the
> character is unescaped.  This set of characters is

That doesn't seem correct: ap_escape_uri() certainly escapes ';', '#'
and '?' for instance (i just verified this).

>    A reason for this:
> If you start with a link
> (http://www.nowhere.com/some_dir?where_you_going=nowhere#top),
> there are a number of special characters that are requred to parse the URI
> correctly.
> Without these characters: {'/', ':'}, there can be no "http://".
>  Without this character; {'?'}, there is no query string... only a run-on
> directory-path.
>  Without this character; {'#'}, there is no anchor... only an incorrectly
> long GET parameter value.

I agree but I don't think that's the scope of ap_escape_uri() (which
is ap_os_escape_path() behind the scenes). I understand that this
function should precisely escape all 'reserved' characters found in
file paths so that they do not interfere with the normal parsing of
queries. The issue here is exactly that: if a filename contains a '&',
it will be interpreted as an argument list and break anything that do
URL parsing (such as what is reported in the debian bug report I
pointed at). I don't get why ap_escape_uri() correctly escapes '?' to
avoid this, but not '&'.

>    This is not a bug; you need to manually escape any of the special
> characters (probably called URI META characters or something like that) if
> you expect them to be URL-encoded.  If all '&' characters were URI-escaped
> all of the time, there would be no way to create a GET parameter list; there
> would never be more than one parameter.

See above, I still believe this is a bug, or there's some kind of
incoherency I don't understand... RFC1738 seems to claim that:

"Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
   reserved characters used for their reserved purposes may be used
   unencoded within a URL."

/reserved characters for their reserved purpose may be used
unencoded/, but it says that outside of their scope they must also be
encoded. My understanding is that ap_os_escape_path() should only be
used on the path part of an URL and as such it should encode the
reserved characters that are not to be found in a the path part of
said URL... That includes '&'.

>    As for a workaround, you will need to find a pool-friendly (assuming you
> are using pools for memory allocation in this specific instance)
> character/substring replacement function.  You will likely want to do a
> straight encode of all components of a URI seperately with this function
> then use the ap_escape_uri().  I am not familiar with a particular function
> that will do the trick, but I use a pool-modified version of a Yahoo!
> C-library function for URL-encoding.

It seems extremely overkill and costly to me to have to do a second
pass of search-n-replace just to escape '&' that ap_escape_uri() has
left aside...

Thanks for your feedback, but I'd like to see more arguments claiming
that this is a feature and not a bug ;)

Thibaut

PS: please CC-me in replies.

-- 
Thibaut VARENE
http://www.parisc-linux.org/~varenet/

Mime
View raw message