httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier>
Subject Re: URL scanning by bots
Date Wed, 01 May 2013 11:51:37 GMT
Marian Marinov wrote:
> On 05/01/2013 12:19 PM, Tom Evans wrote:
>> On Wed, May 1, 2013 at 1:47 AM, André Warnier <> wrote:
>>> Christian Folini wrote:
>>>> Hey André,
>>>> I do not think your protection mechanism is very good (for reasons
>>>> mentioned before) But you can try it out for yourself easily with 2-3
>>>> ModSecurity rules and the "pause" directive.
>>>> Regs,
>>>> Christian
>>> Hi Christian.
>>> With respect, I think that you misunderstood the purpose of the 
>>> proposal.
>>> It is not a protection mechanism for any server in particular.
>>> And installing the delay on one server is not going to achieve much.
>> Putting in any kind of delay means using more resources to deal with
>> the same number of requests, even if you use a dedicated 'slow down'
>> worker to deal especially just with this.
>> The truth of the matter is that these sorts of spidering requests are
>> irrelevant noise on the internet. It's not a targeted attack, it is
>> simply someone looking for easy access to any machine.
> I'm Head of Sysops at fairly large hosting provider, we have more then 
> 2000 machines and I can assure you, this 'noise' as you call it accounts 
> for about 20-25% of all requests to our servers. And the spam uploaded 
> on our servers accounts for about 35-40% of the DB size of all of our 
> customers.
>>> It is something that, if it is installed on enough webservers on the
>>> Internet, may slow down the URL-scanning bots (hopefully a lot), and 
>>> thereby
>>> inconvenience their botmasters. Hopefully to the point where they would
>>> decide that it is not worth scanning that way anymore.  And if it dos 
>>> not
>>> inconvenience them enough to achieve that, at least it should reduce the
>>> effectiveness of these bots, and diminish the number of systems that 
>>> they
>>> can scan over any given time period with the same number of bots.
>> Well, no, actually this is not accurate. You are assuming that these
>> bots are written using blocking io semantics; that if a bot is delayed
>> by 2 seconds when getting a 404 from your server, it is not able to do
>> anything else in those 2 seconds. This is just incorrect.
>> Each bot process could launch multiple requests to multiple unrelated
>> hosts simultaneously, and select whatever ones are available to read
>> from. If you could globally add a delay to bots on all servers in the
>> world, all the bot owner needs to do to maintain the same throughput
>> is to raise the concurrency level of the bot's requests. The bot does
>> the same amount of work in the same amount of time, but now all our
>> servers use extra resources and are slow for clients on 404.
> Actually, what we are observing is completely opposite to what you are 
> saying.
> Delaying spam bots, brute force attacks, and vulnerability scanners 
> significantly decreases the amount of requests we get from them.
> So, our observation tells us, that if you pretend that your machine is 
> slow, the bots abandon this IP and continue to the next one.
> I believe that the bots are doing that, because there are many 
> vulnerable machines on the internet and there is no point in losing time 
> with a few slower ones. I may be wrong, but this is what we have seen.

Thank you immensely.
This illustrates perfectly one of the problems I am encountering with this proposal.
Most of the objections to it are made by people who somehow seem to have some intellectual

"a priori" of how bots or bot-masters would act or react, but without providing any actual

fact to substantiate their opinion.

Once again : I am a little webserver administrator of a small collection of webservers. 
My vision of what happens on the Internet at large is limited to what I can observe on my

own servers.  I do /not/ pretend that this proposal is correct, and I do /not/ pretend 
that its ultimate effect will be what I hope it could be.

But *based on the actual data and patterns which I can observe on my servers (not 
guesses), I think it might have an effect*. And when I try to substantiate this by some 
rough calculations - also based on real numbers which I can observe -, so far I can see 
nothing that would tell me that I am dead wrong.

There is so far one possible pitfall, which was identified by someone earlier on this list

: the fact that delaying 404 responses might have a bad effect on some particular kind of

usage by legitimate clients/users.  So far, I believe that such an effect could be 
mitigated by the fact that this option could be turned off, by any webserver administrator

with a modicum of knowledge.

View raw message