httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier>
Subject Re: URL scanning by bots
Date Tue, 30 Apr 2013 23:09:53 GMT
Graham Leggett wrote:
> On 30 Apr 2013, at 12:03 PM, André Warnier <> wrote:
>> The only cost would a relatively small change to the Apache webservers, which is
what my
>> suggestion consists of : adding a variable delay (say between 100 ms and 2000 ms)
to any
>> 404 response.
> This would have no real effect.
> Bots are patient, slowing them down isn't going to inconvenience a bot in any way. The
simple workaround if the bot does take too long is to simply send the requests in parallel.
At the same time, slowing down 404s would break real websites, as 404 isn't necessarily an
error, but rather simply a notice that says the resource isn't found.
Thank you for your response.
You make several points above, and I would like to respond to them separately.

1) This would have no real effect.
A: yes, it would.

This is a facetious response, of course.  I am making it just in order to illustrate a 
kind of objection which I have encountered before : an "a priori" objection, without a 
real justification.  So I am responding in kind.
This was just for illustration, I hope that you don't mind.

But, you /do/ provide some arguments to justify that, so let me discuss them :

2) "Bots are patient, slowing them down isn't going to inconvenience a bot in any way"

A: I beg to disagree.
First, I would make a distinction between "the bot" (which is just a program running 
somewhere, and can obviously not be inconvenienced), and the "owner" of the bot, usually 
called "bot-master".  And I believe that the bot-master can be seriously inconvenienced.
And through the bots, he is the real target.

Here are my reasons for believing that he can be inconvenienced :

I may seem that creating a bot, distributing it and running it for malicious purposes is 
free. But that is not true, it has a definite cost.
Most countries now have laws defining this as criminal behaviour, and many countries now 
have dedicated officials which are trying to track down "bot-masters" and bring them to 
So the very first cost of running a botnet is the opportunity risk of getting caught, 
paying a big fine and maybe going to prison. And this is not just theory.
There have been several items in the news in the last few of years that show this to be 
true.  Search Google for "botmaster jailed" e.g.

As a second argument, I would state that if it did not cost anything to create and run a 
botnet, then nobody would pay for it.  And that it not true. Nowadays one can purchase bot

code, or rent an existing botnet - or even parts of it - for a price. And the price is not

trivial. To rent a botnet of several thousand bots for a week can cost several thousand US

Dollars.  And obviously, there is a market.
See here :
or search Google for equivalent information.

If it does cost something to create and run a malicious botnet, then if someone does it, 
it is in order to get a return on his investment.
The kind of desired return can vary (think Anonymous or some intelligence services), but 
it is obvious to me that if someone is running botnets which *do* scan my servers (and 
most servers on the Internet) for vulnerable URLs, they are not doing this for the simple

pleasure of doing it. They are expecting a return, or else they wouldn't do it.
The faster that they can scan servers and identify likely targets for further mischief, 
the better the return compared to the costs.
As long as the likely return outweighs the costs, they will continue.
Raise the cost or lower the return below a certain treshold however, and it will become 
uneconomical, and they will stop.
At what point this would happen, I can't tell.
But I do know one thing : what I am suggesting /would/ slow them down, so it goes in the 
right direction : to raise their cost and/or diminish their return.

2) The simple workaround if the bot does take too long is to simply send the requests in 

A: I already mentioned that point in my original suggestion and I tried to show that it 
doesn't really matter, but let me add another aspect :

The people who run bots are not using their own computers or their own bandwidth to do 
this.  That would be really uneconomical, and really dangerous for them.
Instead, they rely on discreetely "infecting" computers belonging to other people, and 
then using those computers and their bandwidth to run their operation.

If your computer has been infected and is running a bot in the background, you may not 
notice it, as long as the bot is using a small amount of resources.
But if the bot running on your computer starts to use any significant amount of CPU or 
bandwidth, then the probability of you noticing will increase.  And if you notice it, you

will kill it, won't you ? And if you do that, there is one less bot in the botnet.

What I am saying is that one can not just increase forever the amount of parallelism in 
the scans that a bot is performing. There is a limit to the amount of resources that a bot

can use in its host while remaining discreet.
My original sample calculations used individual bots, each issuing 200 requests in 2 
seconds.  How many more can one bot issue and remain discreet ?

So really, if you admit that the suggestion, if implemented, would slow down the action of

scanning a number of servers, then in order to keep scanning the same number of servers in

the same time, the only practical response would be to increase the number of bots doing 
the scanning.
And then, we run back to the argument above : it increases the cost.

3) "At the same time, slowing down 404s would break real websites, as 404 isn't 
necessarily an error, but rather simply a notice that says the resource isn't found."

A: I believe that this is a more tricky objection.
I agree, a 404 is just an indication that the resource isn't found.

But I have been trying to figure out a real use case, where expecting 404 responses in the

course of legitimate applications or website access would be a normal thing to do, and I 
admit that I haven't been able to think of any.
Can you come up with an example where this would really be a user case and where delying 
404 responses would really "break something" ?

I would also like to offer a precision : my suggestion is to make this an *optional* 
feature, that can be easily tuned or disabled by a webserver administrator.
(Similarly to a number of other security-minded configuration directives in Apache httpd)

It would just be a lot more effective if it was enabled by default, in the standard 
configuration of the standard Apache httpd distributions.
The reason for that is again a numbers game :  there are about 600 Million webservers in 
total, at least 60% of them (360,000,000) being "Apache" servers.
Of these 360 million, how many would you say are professionally installed and managed ?
(How many competent webserver administrators are there in the world, and how many 
webservers can each one of them take care of ?)
If I was to venture a number, I would say that the number of Apache webservers that are 
professionally installed and managed is probably not higher than a few milllions, maybe 
10% of the above.
That leaves many more millions which are not so, and those are the target of the 
sugestion.  If it was a default option, then over time, as new Apache httpd webservers are

installed - or older ones upgraded - the proportion of servers where this option is 
activated would automatically increase, without any further intervention.

And as I have already tried to show, any additional percent overall of the installed 
webservers where this would be active, increases the total URL scan time by several 
million seconds. No matter how parallel the scan is, that number doesn't change.

I hope to have provided convincing arguments in my responses to your objections.
And if not, I'll try harder.

There is also a limit for me though : I do not have the skills nor the resources to 
actually set up a working model of this.  I cannot create (or rent) a real botnet and 
thousands of target servers in order to really prove my arguments.
But maybe someone could think of a way to really prove or disprove this ? Whatever the 
results, I would be really delighted.

View raw message