httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dgau...@hotwired.com (Dean Gaudet)
Subject Re: No HOST header solutions?
Date Sun, 02 Jun 1996 08:21:58 GMT
In article <hot.mailing-lists.new-httpd-199605292102.RAA06382@love-bug.ai.mit.edu>,
Robert S. Thau <new-httpd@hyperreal.com> wrote:
>Unfortunately, this has a whole bunch of nasty consequences --- for
>one, it's likely that crawlers that aren't savvy to what's going on
>(which is probably most, given that the protocol doesn't presently
>give them sufficient information to figure it out without a whole lot
>of back-end inference) will wind up indexing every single page twice,
>once under each name.

To help combat this our robots.txt is a CGI which returns disallow /
when it can figure out that the host being requested isn't supposed
to be indexed... one thing I look for is a Host: header... but none
of the big search engines send them.  yet.

I guess robots.txt is from pre-CGI days... the "Useragent" thing
seems more than a little useless.

Dean

Mime
View raw message