httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shiva Shivakumar <sh...@google.com>
Subject apache/sitemaps?
Date Tue, 04 Oct 2005 04:08:46 GMT
Hello folks,

We (@Google) launched Sitemaps to optimize how crawlers work with
webservers, from a hit-or-miss approach to something more directed.
Currently, webcrawlers (including ours) do not know about all pages on a
webserver, or when they change. (A simple "ls -lR" in the ftp-world, that w=
e
dont have in the web-world). Instead, our crawlers crawl pages that are
linked to from other pages and periodically check if they change, like a
random web surfer.

Some of the key aspects of our proposal include (a) a simple XML protocol w=
e
released under Creative Commons 2.0 license so all webservers, webmasters
and search engines could benefit from a common approach, and (b) an
open-source sitemap generator in Python (@sourceforge) that produces
Sitemaps automatically for some common use cases.

It's been about 4 months since we launched, and webmasters have been using
the Sitemaps protocol (and client) to give us URLs for both small (e.g, 100
urls) to large sites (e.g., 10M+ urls), so we figured it is time to ping yo=
u
guys. How do the Apache webserver folks react to something like Sitemaps
protocol being supported in Apache "out of the box" (e.g., as a mod_sitemap=
)
or shipping the sitemap_gen.py tool (or some variant) thro
http://httpd.apache.org/docs/2.1/programs/<http://www.google.com/url?sa=3DD=
&q=3Dhttp%3A%2F%2Fhttpd.apache.org%2Fdocs%2F2.1%2Fprograms%2F>as
a support program (similar to htdigest or htdbm)? And in general,
offering additional mechanisms for webservers to help webcrawlers (an
increasing fraction of webserver activity) much more directly?

thanks,
- shiva

---------------------------------------------------------------------------=
--------------------------------------------------------------

Some links...
1. About Sitemaps --
http://www.google.com/webmasters/sitemaps/docs/en/about.html<http://www.goo=
gle.com/url?sa=3DD&q=3Dhttp%3A%2F%2Fwww.google.com%2Fwebmasters%2Fsitemaps%=
2Fdocs%2Fen%2Fabout.html>
2. Sitemaps protocol --
http://www.google.com/webmasters/sitemaps/docs/en/protocol.html<http://www.=
google.com/url?sa=3DD&q=3Dhttp%3A%2F%2Fwww.google.com%2Fwebmasters%2Fsitema=
ps%2Fdocs%2Fen%2Fprotocol.html>
3. Google released open source sitemap_gen.py --
http://www.google.com/webmasters/sitemaps/docs/en/sitemap-generator.html
<http://www.google.com/url?sa=3DD&q=3Dhttp%3A%2F%2Fwww.google.com%2Fwebmast=
ers%2Fsitemaps%2Fdocs%2Fen%2Fsitemap-generator.html>
4. Third party sitemap generators for webservers/CMS that currently support
Sitemaps: http://code.google.com/sm_thirdparty.html<http://www.google.com/u=
rl?sa=3DD&q=3Dhttp%3A%2F%2Fcode.google.com%2Fsm_thirdparty.html>

Mime
View raw message