httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marc Slemko <ma...@worldgate.com>
Subject Re: mod_dir/1057: Web robots should be told not to index auto-generated index pages
Date Tue, 26 Aug 1997 19:33:00 GMT
On Tue, 26 Aug 1997, Brian Behlendorf wrote:

> 
> I'm not sure I agree with the proposition.  Anyone think otherwise?  If
> not, we should close this with a "we talked about it, we disagree" response.

Indexes can have descriptions in.  Descriptions that do not appear in the
document.  eg. "this is an image of Bill making money".  Not indexing this
is bad.

An IndexOptions option, perhaps....

> 
> 	Brian
> 
> >Delivered-To: apache-bugdb@qmail.hyperreal.org
> >X-bandwidth-by: Hyperreal
> >Date: Tue, 26 Aug 1997 08:10:02 -0700 (PDT)
> >From: Olly Betts <olly@muscat.co.uk>
> >Reply-To: Olly Betts <olly@muscat.co.uk>
> >To: apache-bugdb@apache.org
> >Cc: apache-bugdb@apache.org
> >Subject: mod_dir/1057: Web robots should be told not to index
> auto-generated index pages
> >Sender: apache-bugdb-owner@apache.org
> >
> >
> >>Number:         1057
> >>Category:       mod_dir
> >>Synopsis:       Web robots should be told not to index auto-generated
> index pages
> >>Confidential:   no
> >>Severity:       non-critical
> >>Priority:       medium
> >>Responsible:    apache (Apache HTTP Project)
> >>State:          open
> >>Class:          change-request
> >>Submitter-Id:   apache
> >>Arrival-Date:   Tue Aug 26 08:10:01 1997
> >>Originator:     olly@muscat.co.uk
> >>Organization:
> >apache
> >>Release:        1.3a1
> >>Environment:
> >Linux noxious.muscat.co.uk 2.0.18 #1 Tue Sep 10 10:15:48 EDT 1996 i586
> >>Description:
> >A web robot rarely wants to add auto-generated pages to its database.  But it
> >can't reliably spot them.  Apache could help a lot by marking such pages as
> >not to be indexed by putting:
> >
> ><META NAME=robots CONTENT=noindex>
> >
> >into the HTML <HEAD>...</HEAD> section.  This still allows compliant
> robots to
> >follow links on the page, which is probably what's wanted.
> >
> >See <URL:http://info.webcrawler.com/mak/projects/robots/exclusion.html#meta>
> >for details of the protocol.
> >>How-To-Repeat:
> >Look at:
> >
> >http://www.altavista.digital.com/cgi-bin/query?pg=q&what=web&kl=XX&q=title%
> 3A%22Index+of%22+%22parent+directory%22
> >
> >which gives "about 274150" examples.
> >>Fix:
> >Here's a patch to 1.3a1 -- the change is actually to mod_autoindex, but
> that's
> >not available in the picker on the bug report form.
> >
> >--- src/mod_autoindex.c Mon Jul 21 06:53:49 1997
> >+++ src.mod/mod_autoindex.c     Tue Aug 26 11:43:28 1997
> >@@ -122,6 +122,9 @@
> >  * This routine puts the standard HTML header at the top of the index page.
> >  * We include the DOCTYPE because we may be using features therefrom (i.e.,
> >  * HEIGHT and WIDTH attributes on the icons if we're FancyIndexing).
> >+ * "<META NAME=robots CONTENT=noindex>" tells robots which support the
> protocol
> >+ * that they shouldn't index this page (but that they can follow links).
> >+ * See
> <URL:http://info.webcrawler.com/mak/projects/robots/exclusion.html#meta>
> >  */
> > static void emit_preamble(request_rec *r, char *title)
> > {
> >@@ -131,7 +134,7 @@
> >             "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 3.2 Final//EN\">\n",
> >             "<HTML>\n <HEAD>\n  <TITLE>Index of ",
> >             title,
> >-            "</TITLE>\n </HEAD>\n <BODY>\n",
> >+            "</TITLE>\n  <META NAME=robots CONTENT=noindex>\n </HEAD>\n
> <BODY>\n",
> >             NULL
> >         );
> > }
> >
> >%0
> >>Audit-Trail:
> >>Unformatted:
> >
> >
> >
> --=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
> "Why not?" - TL           brian@organic.com - hyperreal.org - apache.org
> 


Mime
View raw message