httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christian Folini <>
Subject Re: URL scanning by bots
Date Tue, 30 Apr 2013 11:49:45 GMT
Hey André,

I do not think your protection mechanism is very good (for reasons
mentioned before) But you can try it out for yourself easily with 
2-3 ModSecurity rules and the "pause" directive.



On Tue, Apr 30, 2013 at 12:03:28PM +0200, André Warnier wrote:
> Dear Apache developers,
> This is a suggestion relative to the code of the Apache httpd webserver, and a possible
> default new default option in the standard distribution of Apache httpd.
> It also touches on WWW security, which is why I felt that it belongs on this list, rather
> than on the general user's list. Please correct me if I am mistaken.
> According to Netcraft, there are currently some 600 Million webservers on the WWW, with
> more than 60% of those identified as "Apache".
> I currently administer about 25 Apache httpd/Tomcat of these webservers, not remarkable
> any way (business applications for medium-sized companies).
> In the logs of these servers, every day, there are episodes like the following :
> - - [03/Apr/2013:00:52:32 +0200] "GET /muieblackcat HTTP/1.1" 404 362
"-" "-"
> - - [03/Apr/2013:00:52:36 +0200] "GET //admin/index.php HTTP/1.1" 404
> "-" "-"
> - - [03/Apr/2013:00:52:36 +0200] "GET //admin/pma/index.php HTTP/1.1"
> 369 "-" "-"
> - - [03/Apr/2013:00:52:36 +0200] "GET //admin/phpmyadmin/index.php
> HTTP/1.1" 404 376 "-" "-"
> - - [03/Apr/2013:00:52:37 +0200] "GET //db/index.php HTTP/1.1" 404 362
"-" "-"
> - - [03/Apr/2013:00:52:37 +0200] "GET //dbadmin/index.php HTTP/1.1" 404
> "-" "-"
> ... etc..
> Such lines are the telltale trace of a "URL-scanning bot" or of the "URL-scanning" part
> a bot, and I am sure that you are all familiar with them.  Obviously, these bots are
> trying to find webservers which exhibit poorly-designed or poorly-configured applications,
> with the aim of identifying hosts which can be submitted to various kinds of attacks,
> various purposes.  As far as I can tell from my own unremarkable servers, I would surmise
> that many or most webservers facing the Internet are submitted to this type of scan every
> day.
> Hopefully, most webservers are not really vulnerable to this type of scan.
> But the fact is that *these scans are happening, every day, on millions of webservers*.
> And they are at least a nuisance, and at worst a serious security problem  when, as a
> result of poorly configured webservers or applications, they lead to break-ins and
> compromised systems.
> It is basically a numbers game, like malicious emails : it costs very little to do this,
> and if even a tiny proportion of webservers exhibit one of these vulnerabilities, because
> of the numbers involved, it is worth doing it.
> If there are 600 Million webservers, and 50% of them are scanned every day, and 0.01%
> these webservers are vulnerable because of one of these URLs, then it means that every
> day, 30,000 (600,000,000 x 0.5 x 0.0001) vulnerable servers will be identified.
> About the "cost" aspect : from the data in my own logs, such bots seem to be scanning
> about  20-30 URLs per pass, at a rate of about 3-4 URLs per second.
> Since it is taking my Apache httpd servers approximately 10 ms on average to respond
(by a
> 404 Not Found) to one of these requests, and they only request 1 URL per 250 ms, I would
> imagine that these bots have some built-in rate-limiting mechanism, to avoid being
> "caught" by various webserver-protection tools.  Maybe also they are smart, and scan
> several servers in parallel, so as to limit the rate at which they "burden" any server
> particular. (In this rough calculation, I am ignoring network latency for now).
> So if we imagine a smart bot which is scanning 10 servers in parallel, issuing 4 requests
> per second to each of them, for a total of 20 URLs per server, and we assume that all
> these requests result in 404 responses with an average response time of 10 ms, then it
> "costs" this bot only about 2 seconds to complete the scan of 10 servers.
> If there are 300 Million servers to scan, then the total cost for scanning all the
> servers, by any number of such bots working cooperatively, is an aggregated 60 Million
> seconds.  And if one of such "botnets" has 10,000 bots, that boils down to only 6,000
> seconds per bot.
> Scary, that 50% of all Internet webservers can be scanned for vulnerabilities in less
> 2 hours, and that such a scan may result in "harvesting" several thousand hosts,
> candidates for takeover.
> Now, how about making it so that without any special configuration or add-on software
> skills on the part of webserver administrators, it would cost these same bots *about
> times as long (several days)* to do their scan ?
> The only cost would a relatively small change to the Apache webservers, which is what
> suggestion consists of : adding a variable delay (say between 100 ms and 2000 ms) to
> 404 response.
> The suggestion is based on the observation that there is a dichotomy between this kind
> access by bots, and the kind of access made by legitimate HTTP users/clients : legitimate
> users/clients (including the "good bots") are accessing mostly links "which work", so
> rarely get "404 Not Found" responses.  Malicious URL-scanning bots on the other hand,
> the very nature of what they are scanning for, are getting many "404 Not Found" responses.
> As a general idea thus, anything which impacts the delay to obtain a 404 response, should
> impact these bots much more than it impacts legitimate users/clients.
> How much ?
> Let us imagine for a moment that this suggestion is implemented in the Apache webservers,
> and is enabled in the default configuration.  And let's imagine that after a while, 20%
> the Apache webservers deployed on the Internet have this feature enabled, and are now
> delaying any 404 response by an average of 1000 ms.
> And let's re-use the numbers above, and redo the calculation.
> The same "botnet" of 10,000 bots is thus still scanning 300 Million webservers, each
> scanning 10 servers at a time for 20 URLs per server.  Previously, this took about 6000
> seconds.
> However now, instead of an average delay of 10 ms to obtain a 404 response, in 20% of
> cases (60 Million webservers) they will experience an average 1000 ms additional delay
> URL scanned.
> This adds (60,000,000 / 10 * 20 URLs * 1000 ms) 120,000,000 seconds to the scan.
> Divided by 10,000 bots, this is 12,000 additional seconds per bot (roughly 3 1/2 hours).
> So with a small change to the code, no add-ons, no special configuration skills on the
> part of the webserver administrator, no firewalls, no filtering, no need for updates
> any list of URLs or bot characteristics, little inconvenience to legitimate users/clients,
> and a very partial adoption over time, it seems that this scheme could more than double
> the cost for bots to acquire the same number of targets.  Or, seen another way, it could
> more than halve the number of webservers being scanned every day.
> I know that this is a hard sell.  The basic idea sounds a bit too simple to be effective.
> It will not kill the bots, and it will not stop the bots from scanning Internet servers
> other ways that they use. It does not miraculously protect any single server against
> scans, and the benefit of any one server implementing this is diluted over all webservers
> on the Internet.
> But it is also not meant as an absolute weapon.  It is targeted specifically at a
> particular type of scan done by a particular type of bot for a particular purpose, and
> is just a scheme to make this more expensive for them.  It may or may not discourage
> bots from continuing with this type of scan (if it does, that would be a very big result).
> But at the same time, compared to any other kind of tool that can be used against these
> scans, this one seems really cheap to implement, it does not seem to be easy to
> circumvent, and it seems to have at least a potential of bringing big benefits to the
> at large.
> If there are reasonable objections to it, I am quite prepared to accept that, and drop
>  I have already floated the idea in a couple of other places, and gotten what could be
> described as "tepid" responses.  But it seems to me that most of the negative-leaning
> responses which I received so far, were more of the a-priori "it will never work" kind,
> rather than real objections based on real facts.
> So my hope here is that someone has the patience to read through this, and would have
> additional patience to examine the idea "professionally".

Christian Folini - <>

View raw message