httpd-modules-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Wortham" <djwort...@gmail.com>
Subject Re: help with ap_escape_uri()
Date Sat, 05 May 2007 19:22:05 GMT
Thibut,
   Point taken.  I didn't have any trouble with ap_escape_uri, but then
again I'm not testing on Debian.

   I fear the problem is that the "ap_escape_uri(...)" function has been
turned into a macro for the "ap_os_escape_path(...)" function (as you said
before).  It seems as though the pertinent character mapping table involved
with the ap_os_escape_path function is OS-dependent (I saw some tables which
considered both '&' and '?' characters which should be escaped and some
which considered only '?'.  I'm sure some of the other characters that
should be escaped will get skipped on certain OSes too.

   Since the list of escapable URI characters is governed by an RFC and is
OS-independent, the function should probably not have been merged into the
OS-dependent function (I saw mailing list archives around 2002 where coders
were actively changing code from using "ap_os_escape_path" calls to
"ap_escape_uri", so I am assuming they were once independent functions).

   My assumption is that this differentiation will only be seen in certain
OSes, but that the true "bug" is that "ap_escape_uri" is functionally the
same as "ap_os_escape_path" when they should be different.  In either case,
I think your solution is to use a 3rd party function (or write your own) to
URI-encode unless you can guarantee that your module is compiled against an
RFC-compliant URI_encode function.

Regards,
Dave

P.S. Please DON'T CC me in replies unless it is a BCC.  I am trying to make
it harder for email harvesters to get my addresses, not easier.  Thanks.




On 5/5/07, Thibaut VARENE <T-Bone@parisc-linux.org> wrote:
>
> On 5/5/07, David Wortham <djwortham@gmail.com> wrote:
> > Thiabut,
> >    As far as I know, URI escaping functions escape all non-alpha
> numberics
> > which are not in the following set of characters: {'-', '_', ':', '/',
> '?',
> > '=', '&', '#', '.'} (there may be others I can't think of right
> now).  If a
> > character is in that set of characters, the URI remains "legal" even if
> the
> > character is unescaped.  This set of characters is
>
> That doesn't seem correct: ap_escape_uri() certainly escapes ';', '#'
> and '?' for instance (i just verified this).
>
> >    A reason for this:
> > If you start with a link
> > (http://www.nowhere.com/some_dir?where_you_going=nowhere#top),
> > there are a number of special characters that are requred to parse the
> URI
> > correctly.
> > Without these characters: {'/', ':'}, there can be no "http://".
> >  Without this character; {'?'}, there is no query string... only a
> run-on
> > directory-path.
> >  Without this character; {'#'}, there is no anchor... only an
> incorrectly
> > long GET parameter value.
>
> I agree but I don't think that's the scope of ap_escape_uri() (which
> is ap_os_escape_path() behind the scenes). I understand that this
> function should precisely escape all 'reserved' characters found in
> file paths so that they do not interfere with the normal parsing of
> queries. The issue here is exactly that: if a filename contains a '&',
> it will be interpreted as an argument list and break anything that do
> URL parsing (such as what is reported in the debian bug report I
> pointed at). I don't get why ap_escape_uri() correctly escapes '?' to
> avoid this, but not '&'.
>
> >    This is not a bug; you need to manually escape any of the special
> > characters (probably called URI META characters or something like that)
> if
> > you expect them to be URL-encoded.  If all '&' characters were
> URI-escaped
> > all of the time, there would be no way to create a GET parameter list;
> there
> > would never be more than one parameter.
>
> See above, I still believe this is a bug, or there's some kind of
> incoherency I don't understand... RFC1738 seems to claim that:
>
> "Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
>    reserved characters used for their reserved purposes may be used
>    unencoded within a URL."
>
> /reserved characters for their reserved purpose may be used
> unencoded/, but it says that outside of their scope they must also be
> encoded. My understanding is that ap_os_escape_path() should only be
> used on the path part of an URL and as such it should encode the
> reserved characters that are not to be found in a the path part of
> said URL... That includes '&'.
>
> >    As for a workaround, you will need to find a pool-friendly (assuming
> you
> > are using pools for memory allocation in this specific instance)
> > character/substring replacement function.  You will likely want to do a
> > straight encode of all components of a URI seperately with this function
> > then use the ap_escape_uri().  I am not familiar with a particular
> function
> > that will do the trick, but I use a pool-modified version of a Yahoo!
> > C-library function for URL-encoding.
>
> It seems extremely overkill and costly to me to have to do a second
> pass of search-n-replace just to escape '&' that ap_escape_uri() has
> left aside...
>
> Thanks for your feedback, but I'd like to see more arguments claiming
> that this is a feature and not a bug ;)
>
> Thibaut
>
> PS: please CC-me in replies.
>
> --
> Thibaut VARENE
> http://www.parisc-linux.org/~varenet/
>



-- 
David Wortham
Senior Web Applications Developer
Unspam Technologies, Inc.
(408) 338-8863

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message