httpd-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dale's stuff <st...@colony.net>
Subject Re: Wget
Date Mon, 26 Aug 2002 12:01:25 GMT
Hello,

On Monday, August 26, 2002, at 12:09  PM, Rodent of Unusual Size wrote:

> Boyle Owen wrote:
>>
>> What do you have against wget? If you put pages on the web, they are 
>> publically
>> available so what do you care what agent people use to browse them?

Ahh.... because too many people use such a tool for purposes of the dark 
side - e.g. stealing my content and images and so forth.

> Wget is hardly a browser.  I have it blocked because I kept getting 
> hammered
> by recursive site scraping, and found some of my pages reproduced wlsewhere
> as a consequence.

I have encountered the same thing, with people doing what appears to be a 
DoS attack on my server.

>> If it's just that you don't like robots because they don't read your 
>> adverts,
>> then create a robots.txt file in your docroot and "Disallow: /" (see
>> http://www.robotstxt.org/wc/norobots.html).
>
> Does wget honour robots.txt?

By default yes, however, there is a command to let wget ignore the robots.
txt file.

Also, you can have wget masquerade as a different browser.


> --
> #ken	P-)}
>
> Ken Coar, Sanagendamgagwedweinini  http://Golux.Com/coar/
> Author, developer, opinionist      http://Apache-Server.Com/
>
> "Millennium hand and shrimp!"

Dale


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Mime
View raw message