httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Laurie <...@gonzo.ben.algroup.co.uk>
Subject Re: # in file names...
Date Tue, 03 Oct 1995 09:22:52 GMT
> 
> > > Ben wrote:
> > > > > Ok, how about:
> > > > > 
> > > > >       % touch "foo^V^Mbar"            ie foo, ctrl-M, bar
> > > > > 
> > > > > This produces a file that:
> > > > > 
> > > > > a)    looks like sh*t on the screen
> > > >
> > > > Well, some kind of escaping needs to be done for the text, too. That
> > > > could take a little more discussion than fixing the URI.
> > > 
> > > Nearly there. Note that escape_uri is a misnomer; it should really be
> > > called escape_http_path, and it is currently trying to do two things.
> > > 
> > > 1. Escape a path to make a valid URL path.
> > > 2. Escape a URL path so that it can used in an HTML document.
> > > 
> > > For 1, it needs to % escape _all_ characters except for
> > > a-z A-Z 0-9 $ - _ . + ! * ' ( ) , : @ & =
> > 
> > This is not my reading of RFC 1808. There the "unreserved" characters are
> > defined to be "alpha | digit | safe | extra".
> 
> >  Alpha and digit are as we expect, safe is "$-_.+" and extra is "!*'(),". It
> >  may be that there are additional characters which can safely be used in the
> >  context of an FTP URL,
> 
> I think you mean an HTTP URL, and the extra characters allowed are : @ & =

You're right. Directory listings make me think FTP. Oops.

> 
> > but there is no harm in escaping them. Section 5.3 specifically recommends
> > against the unescaped use of ":",
> 
> Correct, it is harmless. In fact 5.3 recommends prefixing relative
> URLs with ./ to avoid problems with ':';

I know, I was reinterpreting on the fly.

> however, it would be simpler for
> escape_uri to escape ':'.
> 
> > and ":@&=" are all reserved in a generic-RL.
> 
> Yes, but you are allowed to use reserved characters! reserved != forbidden
> Reserved means that they _may_ be defined to have special semantics.
> Whereas unreserved characters cannot be defined to have specicial semantics,
> I think.

Maybe so, however, I see no reason to leave these characters unescaped. It
only improves the system ever so slightly, and may break later when Apache
supports new semantics.

> 
> > > For 2, only the & needs to be escaped, assuming the HREF is enclosed in
> > > double quotes ("), so all characters except for
> > > a-z A-Z 0-9 $ - _ . + ! * ' ( ) , : @ =
> > > should be escaped.
> > 
> > When does Apache need to do this?
> 
> When it outputs a directory listing (as in the original bug report); this the
> raison d'etre of unescape_uri. (Properly called unescape_httppath.)
> 
> So our list of acceptable characters in a path is now
> a-z A-Z 0-9 $ - _ . + ! * ' ( ) , @ =

See above.

> 
>  David.

-- 
Ben Laurie                  Phone: +44 (181) 994 6435
Freelance Consultant        Fax:   +44 (181) 994 6472
and Technical Director      Email: ben@algroup.co.uk
A.L. Digital Ltd,
London, England.

Mime
View raw message