httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roy T. Fielding" <>
Subject Re: recursive robot queries
Date Mon, 01 Jan 2001 06:34:23 GMT
> These are allowed to happen due to content negotiation - any extra
> information after a valid link is presumed to simply be PATH_INFO
> information.  So in the example, the above URL will pull up
> the page "/index", i.e. index.html, with "/full/foundation/...." as the
> PATH_INFO.  How did this recursion start?

Blecko... there needs to be a way for ssi files to declare that they
are going to use path_info (or declare that they are not) so that the
server can redirect or block access to bogus URLs.

> I narrowed it down to this sequence of accesses from that host:
> - - [31/Dec/2000:08:07:15 -0800] "GET /docs/misc/known_client_problems.html
HTTP/1.0" 200 13973 "" "Wget/1.5.3"
> - - [31/Dec/2000:08:07:25 -0800] "GET /index/full/4118 HTTP/1.0"
200 3785 "" "Wget/1.5.3"
> - - [31/Dec/2000:08:07:26 -0800] "GET /index/full/foundation/images/asf_logo.gif
HTTP/1.0" 200 3785 "" "Wget/1.5.3"
> Somehow Wget is munging the link from known_client_problems.html to
> (a perfectly valid link) into a
> link to, and that URL renders what
>, only the relative URL on that page to
> foundation/images/asf_logo.gif renders out to
>, and
> getting that page leads to....
> Gar.  This is silly.  OK, so I can fix this by redirecting any requests to
> to, but that feels like and is
> an ugly hack.  What's a more general way of solving this?  Is this a bug
> in Wget?

I don't think so -- the presence of would seem to indicate
that something on our side did a redirect using the default hostname instead
of using  I suspect that it is a problem with the vhost config [but this is just guessing on my part].
Or maybe we just need to update the output to use full URLs instead
of relative links.


View raw message