httpd-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Geoff Millikan" <gmilli...@t1shopper.com>
Subject RE: [users@httpd] Scrubbing log files
Date Tue, 13 Apr 2010 18:55:09 GMT
> Are there any lists of common robots on the net?  Are there 
> some regular expressions or searches that would help? Are 
> there known IP addresses that are safe to discard?

I believe your question is off topic for this forum however I'll share our
joy with you.

Some are known by hostname:
http://googlewebmastercentral.blogspot.com/2006/09/how-to-verify-googlebot.h
tml 

others by IP:
http://www.cuil.com/info/webmaster_info/ 

We whitelist certain bots and others, if they crawl too fast and don't obey
robots.txt, become banned.  Maintaining this is alot of ongoing task,
especially if the bot company is using plain IP addresses to identify
instead of 
http://en.wikipedia.org/wiki/Forward-confirmed_reverse_DNS which Google,
MSN, Yahoo, etc. use which is much more flexible.

Some code & thoughts to keep you busy:
http://www.webmasterworld.com/google/3092423.htm
http://www.webmasterworld.com/php/3606836.htm

Thanks,

http://www.t1shopper.com/

Mime
View raw message