httpd-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Favor <da...@davidfavor.com>
Subject Re: [users@httpd] tuning question
Date Fri, 18 Jul 2014 15:19:28 GMT
Miles Fidelman wrote:
> Hi Folks,
> 
> Ever once in a while, a crawler comes along and starts indexing our site 
> - and in the process pushes our server's load average through the roof.
> 
> Short of blocking the crawlers, can anybody suggest some quick tuning 
> adjustments to make, to reduce load (setting the max. number of servers 
> and/or requests, renicing processes)?
> 
> Yes - my next step is to go pour through manuals - but I expect others 
> have done this enough to be able to point me at a few specific config 
> file lines to change, and specific commands for identifying and renicing 
> processes.
> 
> Thanks very much,
> 
> Miles Fidelman
> 

http://BadBotBlocker.com is some code I use on client sites to do adaptive blocking.

    Warning: code is ugly + requires a good cleanup + packaging for different OSes.

Pretty simple.

1) Anyone who follows the /bad-spider/ link gets blocked for 1 hour
    via iptables (all ports + protocols)

2) After 1 hour the iptables block rule is removed

3) Add <display:none> links to /bad-spider/ in every file served,
    so only non-humans every "see this link"

4) add the link /bad-spider/ as blocked for everyone to robots.txt

This simple approach has cut down traffic by 80-90% for some of my clients.

Because the rules only last for 1 hour, no site is blocked forever.

Because the rules are adaptive (appear + disappear), there no maintenance.

After reboots, all rules are lost + process just starts again.

Very simple code.

- David, Skype: davidfavor

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Mime
View raw message