httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From d...@ast.cam.ac.uk (David Robinson)
Subject Re: # in file names...
Date Mon, 02 Oct 1995 17:44:00 GMT
> > Ben wrote:
> > > > Ok, how about:
> > > > 
> > > >       % touch "foo^V^Mbar"            ie foo, ctrl-M, bar
> > > > 
> > > > This produces a file that:
> > > > 
> > > > a)    looks like sh*t on the screen
> > >
> > > Well, some kind of escaping needs to be done for the text, too. That
> > > could take a little more discussion than fixing the URI.
> > 
> > Nearly there. Note that escape_uri is a misnomer; it should really be
> > called escape_http_path, and it is currently trying to do two things.
> > 
> > 1. Escape a path to make a valid URL path.
> > 2. Escape a URL path so that it can used in an HTML document.
> > 
> > For 1, it needs to % escape _all_ characters except for
> > a-z A-Z 0-9 $ - _ . + ! * ' ( ) , : @ & =
> 
> This is not my reading of RFC 1808. There the "unreserved" characters are
> defined to be "alpha | digit | safe | extra".

>  Alpha and digit are as we expect, safe is "$-_.+" and extra is "!*'(),". It
>  may be that there are additional characters which can safely be used in the
>  context of an FTP URL,

I think you mean an HTTP URL, and the extra characters allowed are : @ & =

> but there is no harm in escaping them. Section 5.3 specifically recommends
> against the unescaped use of ":",

Correct, it is harmless. In fact 5.3 recommends prefixing relative
URLs with ./ to avoid problems with ':'; however, it would be simpler for
escape_uri to escape ':'.

> and ":@&=" are all reserved in a generic-RL.

Yes, but you are allowed to use reserved characters! reserved != forbidden
Reserved means that they _may_ be defined to have special semantics.
Whereas unreserved characters cannot be defined to have specicial semantics,
I think.

> > For 2, only the & needs to be escaped, assuming the HREF is enclosed in
> > double quotes ("), so all characters except for
> > a-z A-Z 0-9 $ - _ . + ! * ' ( ) , : @ =
> > should be escaped.
> 
> When does Apache need to do this?

When it outputs a directory listing (as in the original bug report); this the
raison d'etre of unescape_uri. (Properly called unescape_httppath.)

So our list of acceptable characters in a path is now
a-z A-Z 0-9 $ - _ . + ! * ' ( ) , @ =

 David.

Mime
View raw message