tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier>
Subject Re: Tomcat access log reveals hack attempt: "HEAD /manager/html HTTP/1.0" 404
Date Mon, 22 Apr 2013 08:49:01 GMT
chris derham wrote:
>> Let me just summarise my arguments then :
>> 1) These scans are a burden for all webservers, not just for the vulnerable
>> ones.  Whether we want to or not, we currently all have to invest resources
>> into countering (or simply responding to) these scans.  Obviously, just
>> ignoring them doesn't stop them, and just protecting one's own servers
>> against them doesn't stop them in a general sense.
>> 2) there is a fundamental asymmetry between how bots access a server (and
>> most of the responses that they get), and how "normal" clients access a
>> server : "normal" clients receive mostly non-404 responses, while bots - by
>> the very nature of what they are looking for - receive many 404 responses.
>> So anything that would in some way "penalise" 404 responses with respect to
>> other ones, should impact bots much more than normal clients
>> 3) setting up a bot to perform such a scanning operation has a cost; if the
>> expected benefit does not cover the cost, it makes no sense to do it.
>> Assuming that botmasters are rational, they should stop doing it then. It is
>> debatable what proportion of servers would need to implement this proposal
>> in order for this kind of bot-scanning to become uneconomical in a general
>> sense.  What is certain is that, if none do and no better general scheme is
>> found, the scans will continue.  It is also fairly certain that if all
>> servers did, this particular type of scan would stop.
>> 4) it is not obvious right now which method bots could use to circumvent
>> this in order to continue scanning HTTP servers for these known potentially
>> vulnerable URLs. I do not discount that these people are smart, and that
>> they could find a way.
>> But so far it would seem that any scheme thought of by people commenting on
>> this idea, have their own costs in some way and do not invalidate the basic
>> idea.
>> 5) if the scheme works, and it does the effect of making this type of
>> server-scanning uneconomical, bot developers will look for other ways to
>> find vulnerable targets.
>> It is just not obvious to me where they would move their focus, HTTP-wise.
>> If their aim is to find vulnerable URLs on webservers, what else can they do
>> but try them ?
>> 6) intuitively, it seems that implementing this would not be very
>> complicated, and that the foreseeable cost per server, in terms of
>> complexity and performance, would be quite low.  The burden imposed on
>> normal clients would also seem to be small.
>> Maybe this should be evaluated in terms of a comparison with any other
>> method that could provide some similar benefit at lower costs.
>> 7) once implemented, it would be something which does not require any
>> special skills or and special effort on the part of the vast majority of
>> people that download and install tomcat.  Which means that it has a real
>> chance to automatically spread over time to a large proportion of servers.
>> This is quite unlike any other bot-fighting measure that I have seen
>> mentioned so far in this thread.
>> 8) an obvious drawback to this scheme, is that if it works, it would take a
>> long time to show its effects, because
>> a) it would take a long time before a significant proportion of active
>> servers implement the scheme
>> b) even then, it would probably take an even longer time for the bots to
>> adapt their behaviour (the time for the current generation to die out)
>> So in politics, this would be a no-no, and I will probably never get a Nobel
>> prize for it either.  Damn. I would welcome any idea to spread this faster
>> and allow me to gain a just recognition for my insights however.
> So a miscreant decides that they want to hack into a computer. Like
> most things in computing, they break the task down into smaller more
> manageable tasks. Step 1 to find targets. Easiest step would seem to
> be to enumerate every ip4 address possible, and sent a tcp/ip packet
> to some known ports. If you get a response, its a live IP address. You
> don't need to map every port, just establish if the host is listening
> to the internet. This will allow you to build up a list of live IP
> addresses and feed into step 2
> Step 2 fingerprint those IP addresses. To do this, use a scanning
> tool. These send packets to ports of a given IP address, looking at
> the responses. They don't just look for positive responses, they also
> send badly formed/invalid packets. They use many techniques to do
> this. My favorite is the xmas tree packet. The low level TCP protocol
> defines several fields as control fields - the xmas tree packet flags
> all control fields as true. The packet is completely invalid at a TCP
> level, but different os'es will respond differently. The results of
> all of these responses provide a fingerprint, which should provide a
> identification of what os the server is running. Using similar
> techniques it is generally possible to identify the software stack
> running on each port. Sometime there will be 100% confidence in the
> results, sometimes less. Sometimes the software can't tell what the
> software stack on the server is. However the aim of the game is to
> work out which os and which software is running on the port. The
> miscreants are after the low hanging fruit anyway right? So they build
> up a list of IP addresses with software running on each port, and feed
> to step 3
> Step 3 If all has gone well in steps 1 and 2, you now you a list of Ip
> addresses with names and versions of os and the server side software
> running, and in some cases patch level. Combine this with any of the
> publicly available exploit databases, and you can cherry pick which of
> the low hanging fruit you wish to attack using known exploits that
> haven't been patched yet.
> Step 4 is if you don't have any targets with known exploits, then you
> have to start looking for holes manually. The value varies, but they
> say that there is one exploitable defect per thousand lines of code.
> With this in mind, and an os/server stack/app stack combining to
> contain many millions of lines of code, there should be ample scope
> for finding a hole. Most os'es and app servers are reviewed by
> security experts, and have been battle hardened. Most apps have not.
> Apps seem to be the common weak point, second only to users and weak,
> reused passwords. The scanners are getting better and better each day.
> Some are now capable of detecting SQL injection defects in forms, and
> flagging a site as ripe for targeting.
> So coming back to your proposal. My first point is that step 1 uses
> TCP/IP connections, so the probing occurs lower down the stack. Hence
> delaying 404 responses will not stop or affect them. Some of step 2
> can occur just looking for normal headers, or by using malformed
> packets against pages that exist, i.e. will not result in 404 packets.
> My second point is that once they have finger printed your server,
> they may be able to launch an exploit. For badly un-patched systems,
> they may never even see a 404 in their logs, as the miscreants may
> break in without triggering one. In short I believe that making
> requests that result in 404 are not the things admins should be
> worried about. There may be some script kiddies out there, probing all
> web sites they find on google for tomcat manager apps. If that is
> their skill level, then I would suggest that you shouldn't worry too
> much about them.
> If your approach was successfully implemented across all
> patched/updated web servers, then the miscreants would still carry on
> probing as their would still be many 1,000's/1,000,000s of servers out
> there that are not patched, and hence not running the delay 404
> software. I know that your argument is that over time, this would
> reduce. However there are still millions of users out there running
> Windows XP (20% according to
> Whilst I know that
> this shouldn't reflect the os used server side, my point is that for
> ~10 years, there will still be badly patched app servers out there not
> running the delay 404 patch. So for the next 5 years (at conservative
> estimate), it will still be worth searching out these old servers. I
> know your argument is that after a percentage of web servers have the
> 404 delay software in place, scanners will slow. My points are a) the
> scanners will fingerprint the newer releases and not scan them b) most
> scans from real hackers will not result in 404s. There may be some,
> but most of their probing will not return these responses.
> I think that you have articulated your suggestion very well. I think
> you have weighed the pros well and been open to debate. Personally I
> just don't think what you propose will have the effect that you
> desire. However since I seem to be the only voice of dissent, I will
> stop now. I would like to hear some other list members to voice in
> with their thoughts - PID it is not like you to be shy of coming
> forward. What are your thoughts?
> Personally end-user/developer/administrator education would seem a
> prudent avenue to reduce the problems on the modern internet.

Thank you for your thoughtful responses and comments.
The above should be required reading for would-be botmasters.

I feel that I have to add a couple of comments still.

I am totally aware of the fact that bots nowadays are sophisticated beasts, and that they

are using a lot of ways to spread and infect new hosts, sometimes astonishingly broadly 
and quickly. And I know that finding and breaking into webservers that have "vulnerable" 
URLs is only one tiny facet of how they operate, and probably by far not the main one 
(which seems to remain email attachments opened carelessly by users themselves).

If I led anyone to think that I thought that implementing a delay in webservers 404 
responses would kill bots in general, I apologise for such misrepresentation.
The proposed 404 delay is meant only to discourage bots from scanning for vulnerable URLs

in the way in which they (or some of them) seem to be doing it currently.
The origin of my proposal is in fact my personal annoyance at seeing these scans appear 
relentlessly since years now in the logs of all those of my servers which are 
internet-facing.  I had been thinking for a long time (and probably not alone) of some way

to get rid of them, which would not require additional resources to be spent by my own 
infrastructure, nor require a major investment in terms of configuration and setup, nor 
make life more unpleasant for legitimate users, and thus be potentially usable on a 
majority of webservers on the Internet.  Like, squaring the circle.
Then I hit on this idea, which seems to have at least the potential to annoy bots in this

particular case.

Bots are sophisticated and multi-faceted. So it is futile to look for one definitive 
weapon to get rid of them, and the approach has to be also multi-faceted. This could be 
one of these facets, no more and no less.
Following your comments above, I have done some additional back-of-the-envelope 
calculations, which seem to show that even applying your bot-efficiency principles above,

adding a 404 delay of 1 s on less than 20% of Internet webservers, would already slow down

this particular line of enquiry/attack by this particular kind of bot by at least 50%
(details on request).  Independently of everything else, I believe that this is already 
something worth having, and unless someone can prove that the approach is wrong-headed, I

will thus continue to advocate it.

But honestly, I am also a bit at a loss now as to how to continue.  There is of course no

way for me to prove the validity of the scheme by installing it on 31 million (20%) of 
webservers on the Internet and looking at the resulting bot activity patterns to confirm 
my suspicions.

The Wikipedia article on botnets mentions the following :
"Researchers at Sandia National Laboratories are analyzing botnets behavior by 
simultaneously running one million Linux kernels as virtual machines on a 4,480-node 
high-performance computer cluster.[12]"
Maybe I should get in touch with them ?

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message