httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Behlendorf <br...@organic.com>
Subject mod_dir/1057: Web robots should be told not to index auto-generated index pages
Date Tue, 26 Aug 1997 19:08:07 GMT

I'm not sure I agree with the proposition.  Anyone think otherwise?  If
not, we should close this with a "we talked about it, we disagree" response.

	Brian

>Delivered-To: apache-bugdb@qmail.hyperreal.org
>X-bandwidth-by: Hyperreal
>Date: Tue, 26 Aug 1997 08:10:02 -0700 (PDT)
>From: Olly Betts <olly@muscat.co.uk>
>Reply-To: Olly Betts <olly@muscat.co.uk>
>To: apache-bugdb@apache.org
>Cc: apache-bugdb@apache.org
>Subject: mod_dir/1057: Web robots should be told not to index
auto-generated index pages
>Sender: apache-bugdb-owner@apache.org
>
>
>>Number:         1057
>>Category:       mod_dir
>>Synopsis:       Web robots should be told not to index auto-generated
index pages
>>Confidential:   no
>>Severity:       non-critical
>>Priority:       medium
>>Responsible:    apache (Apache HTTP Project)
>>State:          open
>>Class:          change-request
>>Submitter-Id:   apache
>>Arrival-Date:   Tue Aug 26 08:10:01 1997
>>Originator:     olly@muscat.co.uk
>>Organization:
>apache
>>Release:        1.3a1
>>Environment:
>Linux noxious.muscat.co.uk 2.0.18 #1 Tue Sep 10 10:15:48 EDT 1996 i586
>>Description:
>A web robot rarely wants to add auto-generated pages to its database.  But it
>can't reliably spot them.  Apache could help a lot by marking such pages as
>not to be indexed by putting:
>
><META NAME=robots CONTENT=noindex>
>
>into the HTML <HEAD>...</HEAD> section.  This still allows compliant
robots to
>follow links on the page, which is probably what's wanted.
>
>See <URL:http://info.webcrawler.com/mak/projects/robots/exclusion.html#meta>
>for details of the protocol.
>>How-To-Repeat:
>Look at:
>
>http://www.altavista.digital.com/cgi-bin/query?pg=q&what=web&kl=XX&q=title%
3A%22Index+of%22+%22parent+directory%22
>
>which gives "about 274150" examples.
>>Fix:
>Here's a patch to 1.3a1 -- the change is actually to mod_autoindex, but
that's
>not available in the picker on the bug report form.
>
>--- src/mod_autoindex.c Mon Jul 21 06:53:49 1997
>+++ src.mod/mod_autoindex.c     Tue Aug 26 11:43:28 1997
>@@ -122,6 +122,9 @@
>  * This routine puts the standard HTML header at the top of the index page.
>  * We include the DOCTYPE because we may be using features therefrom (i.e.,
>  * HEIGHT and WIDTH attributes on the icons if we're FancyIndexing).
>+ * "<META NAME=robots CONTENT=noindex>" tells robots which support the
protocol
>+ * that they shouldn't index this page (but that they can follow links).
>+ * See
<URL:http://info.webcrawler.com/mak/projects/robots/exclusion.html#meta>
>  */
> static void emit_preamble(request_rec *r, char *title)
> {
>@@ -131,7 +134,7 @@
>             "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 3.2 Final//EN\">\n",
>             "<HTML>\n <HEAD>\n  <TITLE>Index of ",
>             title,
>-            "</TITLE>\n </HEAD>\n <BODY>\n",
>+            "</TITLE>\n  <META NAME=robots CONTENT=noindex>\n </HEAD>\n
<BODY>\n",
>             NULL
>         );
> }
>
>%0
>>Audit-Trail:
>>Unformatted:
>
>
>
--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
"Why not?" - TL           brian@organic.com - hyperreal.org - apache.org

Mime
View raw message