httpd-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wolter Kamphuis" <apa...@wkamphuis.student.utwente.nl>
Subject Re: Wget
Date Mon, 26 Aug 2002 13:13:31 GMT
Hi,

I also had some problems with webspiders. A website I’m running consists
of many (1500) pages showing each one image, like a gallery. People who
wanted to have all the images just let wget do a recursive download of the
complete website. The result was that almost half of my traffic went to
those webspiders.

I now use robotcop (http://www.robotcop.org/) to block webspiders. On some
of my pages (especially dynamic ones) I include a one-pixel image-link.
Everyone following this link will be blocked for two days. Normal browsers
won't follow this link so they are unaffected. I catch about 10 to 20
people a day using wget, teleport pro and more of such spiders.

However, there are some issues using robotcop. There always is a change
you will block innocent users, about one or two of the spiders I daily
catch are innocent users. There’s not much I can do about it since I don’t
know why they follow the ‘invisible link’. Still one or two of 30k
visitors isn’t that much.

Also, if you have robotcop behave like a tarpit (very slowly serve crap to
the clients) every caught spider will occupy one (or more) apache
processes, in that case its easy to perform a dos attack if you have the
right tools. I found a way to solve this by building a special ‘tarpitd’
daemon that handles the ‘crap serving’. It also helps against worms and
people trying to scan apache, scanning my webserver takes hours for it to
complete.

mzzl
  Wolter


> Is there a way to protect the websites on my server from someone using
> Wget??
>
> Any help is apreciated.
>
> TIA.
>
> Tom
>
>
> ---------------------------------------------------------------------
> The official User-To-User support forum of the Apache HTTP Server
> Project. See <URL:http://httpd.apache.org/userslist.html> for more info.
> To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
>    "   from the digest: users-digest-unsubscribe@httpd.apache.org
> For additional commands, e-mail: users-help@httpd.apache.org




---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Mime
View raw message