httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Laurie <...@gonzo.ben.algroup.co.uk>
Subject Re: # in file names...
Date Mon, 02 Oct 1995 15:48:17 GMT
> 
> Ben wrote:
> > > Ok, how about:
> > > 
> > >       % touch "foo^V^Mbar"            ie foo, ctrl-M, bar
> > > 
> > > This produces a file that:
> > > 
> > > a)    looks like sh*t on the screen
> >
> > Well, some kind of escaping needs to be done for the text, too. That could
> > take a little more discussion than fixing the URI.
> 
> Nearly there. Note that escape_uri is a misnomer; it should really be called
> escape_http_path, and it is currently trying to do two things.
> 
> 1. Escape a path to make a valid URL path.
> 2. Escape a URL path so that it can used in an HTML document.
> 
> For 1, it needs to % escape _all_ characters except for
> a-z A-Z 0-9 $ - _ . + ! * ' ( ) , : @ & =

This is not my reading of RFC 1808. There the "unreserved" characters are
defined to be "alpha | digit | safe | extra". Alpha and digit are as we expect,
safe is "$-_.+" and extra is "!*'(),". It may be that there are additional
characters which can safely be used in the context of an FTP URL, but there
is no harm in escaping them. Section 5.3 specifically recommends against the
unescaped use of ":", and ":@&=" are all reserved in a generic-RL.

> 
> For 2, only the & needs to be escaped, assuming the HREF is enclosed in
> double quotes ("), so all characters except for
> a-z A-Z 0-9 $ - _ . + ! * ' ( ) , : @ =
> should be escaped.

When does Apache need to do this?

> 
> The current routine escapes : and + unnecessarily. If it were being used
> for escaping other parts of a URL (the query string perhaps), then it could
> legitimately escape ':'. However, the only significant use of escape_uri
> is by mod_dir.c; all other calls to it are immediately followed by a call
> to unescape_uri to undo the escaping.
> 
> So, change the patch to escape all the characters except those I mentioned;
> I would recommend changing the name of the routine.
> 
> Of course, that leaves the problem of converting the filename directly to
> HTML when used as the text of the anchor. A simple solution would be
> to ignore non-printing characters, and assume ISO-8859-1 for the rest.
> 
>  David.
> 
> References:
>  Fielding, R., `Relative Uniform Resource Locators', RFC 1808, UC Irving,
>  June 1995.

-- 
Ben Laurie                  Phone: +44 (181) 994 6435
Freelance Consultant        Fax:   +44 (181) 994 6472
and Technical Director      Email: ben@algroup.co.uk
A.L. Digital Ltd,
London, England.

Mime
View raw message