httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shiva Shivakumar <>
Subject apache/sitemaps?
Date Tue, 04 Oct 2005 04:08:46 GMT
Hello folks,

We (@Google) launched Sitemaps to optimize how crawlers work with
webservers, from a hit-or-miss approach to something more directed.
Currently, webcrawlers (including ours) do not know about all pages on a
webserver, or when they change. (A simple "ls -lR" in the ftp-world, that we
dont have in the web-world). Instead, our crawlers crawl pages that are
linked to from other pages and periodically check if they change, like a
random web surfer.

Some of the key aspects of our proposal include (a) a simple XML protocol we
released under Creative Commons 2.0 license so all webservers, webmasters
and search engines could benefit from a common approach, and (b) an
open-source sitemap generator in Python (@sourceforge) that produces
Sitemaps automatically for some common use cases.

It's been about 4 months since we launched, and webmasters have been using
the Sitemaps protocol (and client) to give us URLs for both small (e.g, 100
urls) to large sites (e.g., 10M+ urls), so we figured it is time to ping you
guys. How do the Apache webserver folks react to something like Sitemaps
protocol being supported in Apache "out of the box" (e.g., as a mod_sitemap)
or shipping the tool (or some variant) thro<>as
a support program (similar to htdigest or htdbm)? And in general,
offering additional mechanisms for webservers to help webcrawlers (an
increasing fraction of webserver activity) much more directly?

- shiva


Some links...
1. About Sitemaps --<>
2. Sitemaps protocol --<>
3. Google released open source --
4. Third party sitemap generators for webservers/CMS that currently support

View raw message