httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier>
Subject Re: URL scanning by bots
Date Wed, 01 May 2013 00:47:55 GMT
Christian Folini wrote:
> Hey André,
> I do not think your protection mechanism is very good (for reasons
> mentioned before) But you can try it out for yourself easily with 
> 2-3 ModSecurity rules and the "pause" directive.
> Regs,
> Christian
Hi Christian.

With respect, I think that you misunderstood the purpose of the proposal.
It is not a protection mechanism for any server in particular.
And installing the delay on one server is not going to achieve much.

It is something that, if it is installed on enough webservers on the Internet, may slow 
down the URL-scanning bots (hopefully a lot), and thereby inconvenience their botmasters.

Hopefully to the point where they would decide that it is not worth scanning that way 
anymore.  And if it dos not inconvenience them enough to achieve that, at least it should

reduce the effectiveness of these bots, and diminish the number of systems that they can 
scan over any given time period with the same number of bots.

> On Tue, Apr 30, 2013 at 12:03:28PM +0200, André Warnier wrote:
>> Dear Apache developers,
>> This is a suggestion relative to the code of the Apache httpd webserver, and a possible
>> default new default option in the standard distribution of Apache httpd.
>> It also touches on WWW security, which is why I felt that it belongs on this list,
>> than on the general user's list. Please correct me if I am mistaken.
>> According to Netcraft, there are currently some 600 Million webservers on the WWW,
>> more than 60% of those identified as "Apache".
>> I currently administer about 25 Apache httpd/Tomcat of these webservers, not remarkable
>> any way (business applications for medium-sized companies).
>> In the logs of these servers, every day, there are episodes like the following :
>> - - [03/Apr/2013:00:52:32 +0200] "GET /muieblackcat HTTP/1.1" 404
362 "-" "-"
>> - - [03/Apr/2013:00:52:36 +0200] "GET //admin/index.php HTTP/1.1"
404 365
>> "-" "-"
>> - - [03/Apr/2013:00:52:36 +0200] "GET //admin/pma/index.php HTTP/1.1"
>> 369 "-" "-"
>> - - [03/Apr/2013:00:52:36 +0200] "GET //admin/phpmyadmin/index.php
>> HTTP/1.1" 404 376 "-" "-"
>> - - [03/Apr/2013:00:52:37 +0200] "GET //db/index.php HTTP/1.1" 404
362 "-" "-"
>> - - [03/Apr/2013:00:52:37 +0200] "GET //dbadmin/index.php HTTP/1.1"
404 367
>> "-" "-"
>> ... etc..
>> Such lines are the telltale trace of a "URL-scanning bot" or of the "URL-scanning"
part of
>> a bot, and I am sure that you are all familiar with them.  Obviously, these bots
>> trying to find webservers which exhibit poorly-designed or poorly-configured applications,
>> with the aim of identifying hosts which can be submitted to various kinds of attacks,
>> various purposes.  As far as I can tell from my own unremarkable servers, I would
>> that many or most webservers facing the Internet are submitted to this type of scan
>> day.
>> Hopefully, most webservers are not really vulnerable to this type of scan.
>> But the fact is that *these scans are happening, every day, on millions of webservers*.
>> And they are at least a nuisance, and at worst a serious security problem  when,
as a
>> result of poorly configured webservers or applications, they lead to break-ins and
>> compromised systems.
>> It is basically a numbers game, like malicious emails : it costs very little to do
>> and if even a tiny proportion of webservers exhibit one of these vulnerabilities,
>> of the numbers involved, it is worth doing it.
>> If there are 600 Million webservers, and 50% of them are scanned every day, and 0.01%
>> these webservers are vulnerable because of one of these URLs, then it means that
>> day, 30,000 (600,000,000 x 0.5 x 0.0001) vulnerable servers will be identified.
>> About the "cost" aspect : from the data in my own logs, such bots seem to be scanning
>> about  20-30 URLs per pass, at a rate of about 3-4 URLs per second.
>> Since it is taking my Apache httpd servers approximately 10 ms on average to respond
(by a
>> 404 Not Found) to one of these requests, and they only request 1 URL per 250 ms,
I would
>> imagine that these bots have some built-in rate-limiting mechanism, to avoid being
>> "caught" by various webserver-protection tools.  Maybe also they are smart, and scan
>> several servers in parallel, so as to limit the rate at which they "burden" any server
>> particular. (In this rough calculation, I am ignoring network latency for now).
>> So if we imagine a smart bot which is scanning 10 servers in parallel, issuing 4
>> per second to each of them, for a total of 20 URLs per server, and we assume that
>> these requests result in 404 responses with an average response time of 10 ms, then
>> "costs" this bot only about 2 seconds to complete the scan of 10 servers.
>> If there are 300 Million servers to scan, then the total cost for scanning all the
>> servers, by any number of such bots working cooperatively, is an aggregated 60 Million
>> seconds.  And if one of such "botnets" has 10,000 bots, that boils down to only 6,000
>> seconds per bot.
>> Scary, that 50% of all Internet webservers can be scanned for vulnerabilities in
less than
>> 2 hours, and that such a scan may result in "harvesting" several thousand hosts,
>> candidates for takeover.
>> Now, how about making it so that without any special configuration or add-on software
>> skills on the part of webserver administrators, it would cost these same bots *about
>> times as long (several days)* to do their scan ?
>> The only cost would a relatively small change to the Apache webservers, which is
what my
>> suggestion consists of : adding a variable delay (say between 100 ms and 2000 ms)
to any
>> 404 response.
>> The suggestion is based on the observation that there is a dichotomy between this
kind of
>> access by bots, and the kind of access made by legitimate HTTP users/clients : legitimate
>> users/clients (including the "good bots") are accessing mostly links "which work",
so they
>> rarely get "404 Not Found" responses.  Malicious URL-scanning bots on the other hand,
>> the very nature of what they are scanning for, are getting many "404 Not Found" responses.
>> As a general idea thus, anything which impacts the delay to obtain a 404 response,
>> impact these bots much more than it impacts legitimate users/clients.
>> How much ?
>> Let us imagine for a moment that this suggestion is implemented in the Apache webservers,
>> and is enabled in the default configuration.  And let's imagine that after a while,
20% of
>> the Apache webservers deployed on the Internet have this feature enabled, and are
>> delaying any 404 response by an average of 1000 ms.
>> And let's re-use the numbers above, and redo the calculation.
>> The same "botnet" of 10,000 bots is thus still scanning 300 Million webservers, each
>> scanning 10 servers at a time for 20 URLs per server.  Previously, this took about
>> seconds.
>> However now, instead of an average delay of 10 ms to obtain a 404 response, in 20%
of the
>> cases (60 Million webservers) they will experience an average 1000 ms additional
delay per
>> URL scanned.
>> This adds (60,000,000 / 10 * 20 URLs * 1000 ms) 120,000,000 seconds to the scan.
>> Divided by 10,000 bots, this is 12,000 additional seconds per bot (roughly 3 1/2
>> So with a small change to the code, no add-ons, no special configuration skills on
>> part of the webserver administrator, no firewalls, no filtering, no need for updates
>> any list of URLs or bot characteristics, little inconvenience to legitimate users/clients,
>> and a very partial adoption over time, it seems that this scheme could more than
>> the cost for bots to acquire the same number of targets.  Or, seen another way, it
>> more than halve the number of webservers being scanned every day.
>> I know that this is a hard sell.  The basic idea sounds a bit too simple to be effective.
>> It will not kill the bots, and it will not stop the bots from scanning Internet servers
>> other ways that they use. It does not miraculously protect any single server against
>> scans, and the benefit of any one server implementing this is diluted over all webservers
>> on the Internet.
>> But it is also not meant as an absolute weapon.  It is targeted specifically at a
>> particular type of scan done by a particular type of bot for a particular purpose,
and is
>> is just a scheme to make this more expensive for them.  It may or may not discourage
>> bots from continuing with this type of scan (if it does, that would be a very big
>> But at the same time, compared to any other kind of tool that can be used against
>> scans, this one seems really cheap to implement, it does not seem to be easy to
>> circumvent, and it seems to have at least a potential of bringing big benefits to
the WWW
>> at large.
>> If there are reasonable objections to it, I am quite prepared to accept that, and
drop it.
>>  I have already floated the idea in a couple of other places, and gotten what could
>> described as "tepid" responses.  But it seems to me that most of the negative-leaning
>> responses which I received so far, were more of the a-priori "it will never work"
>> rather than real objections based on real facts.
>> So my hope here is that someone has the patience to read through this, and would
have the
>> additional patience to examine the idea "professionally".

View raw message