httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roy T. Fielding" <field...@ebuilt.com>
Subject Re: recursive robot queries
Date Mon, 01 Jan 2001 06:34:23 GMT
> These are allowed to happen due to content negotiation - any extra
> information after a valid link is presumed to simply be PATH_INFO
> information.  So in the www.apache.org example, the above URL will pull up
> the page "/index", i.e. index.html, with "/full/foundation/...." as the
> PATH_INFO.  How did this recursion start?

Blecko... there needs to be a way for ssi files to declare that they
are going to use path_info (or declare that they are not) so that the
server can redirect or block access to bogus URLs.

> I narrowed it down to this sequence of accesses from that host:
> 
> httpd.apache.org 210.73.88.163 - - [31/Dec/2000:08:07:15 -0800] "GET /docs/misc/known_client_problems.html
HTTP/1.0" 200 13973 "http://httpd.apache.org/docs/misc/compat_notes.html" "Wget/1.5.3"
> www.apache.org 210.73.88.163 - - [31/Dec/2000:08:07:25 -0800] "GET /index/full/4118 HTTP/1.0"
200 3785 "http://httpd.apache.org/docs/misc/known_client_problems.html" "Wget/1.5.3"
> www.apache.org 210.73.88.163 - - [31/Dec/2000:08:07:26 -0800] "GET /index/full/foundation/images/asf_logo.gif
HTTP/1.0" 200 3785 "http://www.apache.org:80/index/full/4118" "Wget/1.5.3"
> 
> Somehow Wget is munging the link from known_client_problems.html to
> http://bugs.apache.org/index/full/4118 (a perfectly valid link) into a
> link to http://www.apache.org/index/full/4118, and that URL renders what
> http://www.apache.org/index, only the relative URL on that page to
> foundation/images/asf_logo.gif renders out to
> http://www.apache.org/index/full/foundation/images/asf_logo.gif, and
> getting that page leads to....
> 
> Gar.  This is silly.  OK, so I can fix this by redirecting any requests to
> www.apache.org/index/full to www.apache.org/, but that feels like and is
> an ugly hack.  What's a more general way of solving this?  Is this a bug
> in Wget?

I don't think so -- the presence of www.apache.org:80 would seem to indicate
that something on our side did a redirect using the default hostname instead
of using bugs.apache.org.  I suspect that it is a problem with the
httpd.apache.org vhost config [but this is just guessing on my part].
Or maybe we just need to update the output to use full URLs instead
of relative links.

....Roy

Mime
View raw message