httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From d...@ast.cam.ac.uk (David Robinson)
Subject Re: # in file names...
Date Mon, 02 Oct 1995 14:49:00 GMT
Ben wrote:
> > Ok, how about:
> > 
> >       % touch "foo^V^Mbar"            ie foo, ctrl-M, bar
> > 
> > This produces a file that:
> > 
> > a)    looks like sh*t on the screen
>
> Well, some kind of escaping needs to be done for the text, too. That could
> take a little more discussion than fixing the URI.

Nearly there. Note that escape_uri is a misnomer; it should really be called
escape_http_path, and it is currently trying to do two things.

1. Escape a path to make a valid URL path.
2. Escape a URL path so that it can used in an HTML document.

For 1, it needs to % escape _all_ characters except for
a-z A-Z 0-9 $ - _ . + ! * ' ( ) , : @ & =

For 2, only the & needs to be escaped, assuming the HREF is enclosed in
double quotes ("), so all characters except for
a-z A-Z 0-9 $ - _ . + ! * ' ( ) , : @ =
should be escaped.

The current routine escapes : and + unnecessarily. If it were being used
for escaping other parts of a URL (the query string perhaps), then it could
legitimately escape ':'. However, the only significant use of escape_uri
is by mod_dir.c; all other calls to it are immediately followed by a call
to unescape_uri to undo the escaping.

So, change the patch to escape all the characters except those I mentioned;
I would recommend changing the name of the routine.

Of course, that leaves the problem of converting the filename directly to
HTML when used as the text of the anchor. A simple solution would be
to ignore non-printing characters, and assume ISO-8859-1 for the rest.

 David.

References:
 Fielding, R., `Relative Uniform Resource Locators', RFC 1808, UC Irving,
 June 1995.

Mime
View raw message