Return-Path: Delivered-To: apmail-lucene-nutch-agent-archive@www.apache.org Received: (qmail 81732 invoked from network); 28 Sep 2005 04:44:06 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 28 Sep 2005 04:44:06 -0000 Received: (qmail 42081 invoked by uid 500); 28 Sep 2005 04:44:06 -0000 Delivered-To: apmail-lucene-nutch-agent-archive@lucene.apache.org Received: (qmail 42066 invoked by uid 500); 28 Sep 2005 04:44:06 -0000 Mailing-List: contact nutch-agent-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: nutch-agent@lucene.apache.org Delivered-To: mailing list nutch-agent@lucene.apache.org Received: (qmail 42053 invoked by uid 99); 28 Sep 2005 04:44:05 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Sep 2005 21:44:05 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS X-Spam-Check-By: apache.org Received-SPF: neutral (asf.osuosl.org: local policy) Received: from [64.202.165.73] (HELO smtpout03-03.mesa1.secureserver.net) (64.202.165.73) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 27 Sep 2005 21:44:11 -0700 Received: (qmail 1943 invoked from network); 28 Sep 2005 04:43:42 -0000 Received: from unknown (HELO gem-wbe03.mesa1.secureserver.net) (64.202.189.35) by smtpout03-03.mesa1.secureserver.net with SMTP; 28 Sep 2005 04:43:42 -0000 Received: (qmail 17141 invoked by uid 99); 28 Sep 2005 04:43:42 -0000 Date: Tue, 27 Sep 2005 21:43:42 -0700 From: WebExpertsAmerica Subject: RE: Your Nutch Crawler is Out of Control - Apache Notified To: Wild Dancer cc: nutch-agent@lucene.apache.org, abuse@cac.washington.edu, noc@cac.washington.edu Message-ID: <20050927214342.c8e3a99245005986d0d6719cee4603ee.1db408236e.wbe@email.email.secureserver.net> MIME-Version: 1.0 Content-Type: TEXT/plain; CHARSET=US-ASCII X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N With all due respect, who the hell are you? Why is a Canadian emailing us about a server located at UW? Why is a UW webserver configured with Nutch (or aliased as Nutch) ignoring our robots.txt file. Something smells... Best Regards, Web Experts America >>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<< WebExpertsAmerica.com Whole Lot More for a Whole Lot Less� $6/hr Professional Web Services http://www.WebExpertsAmerica.com Testimonials: http://www.WebExpertsAmerica.com/testimonials.htm Website Solutions: http://www.WebExpertsAmerica.com/services.htm Chat: WebExpertsNOW AOL, MSN (Hotmail), and Yahoo *Contact us anytime via chat. However, we DENY, BLOCK, and BAN anyone that adds us to their Friend/Buddy list. Nothing personal, a security policy to protect our chat connectivity from competitor abuse. Terms of Service: http://www.WebExpertsAmerica.com/tos.htm Confidential: The information contained in this message is privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to this message and then delete it from your computer. > -------- Original Message -------- > Subject: RE: Your Nutch Crawler is Out of Control - Apache Notified > From: "Wild Dancer" > Date: Tue, September 27, 2005 11:17 pm > To: "'WebExpertsAmerica'" > Cc: , , > > > N e t i q u e t t e > > > 1. Someone uses "Nutch..." as an Agent Identity > 2. Someone does not obey Netiquette > > Nothing related to Nutch... This guy can use "Teleport Pro" as an > identity, or even > User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET > CLR 1.1.4322) > > > Simply, block their IP. > > > > > -----Original Message----- > From: WebExpertsAmerica [mailto:expert@WebExpertsAmerica.com] > Sent: Tuesday, September 27, 2005 12:40 AM > To: Wild Dancer > Cc: nutch-agent@lucene.apache.org; abuse@cac.washington.edu; > noc@cac.washington.edu > Subject: RE: Your Nutch Crawler is Out of Control - Apache Notified > Importance: High > > > > And you ignore our robots text file - what sort of game is this? > Crawling our site for 3 hours every day. > > And... why is this email coming from a private account in Canada and not > a university account where the server is located? > > Here is your IP... > > 70.30.209.252 > > Stop your crawler from hitting our servers! > > The rule is, you follow the rules, and obey our robots.txt file! > > What sort of arrogant techie attitude is this - we would expect much > more from UW! > > Web Experts America > > >>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<< > WebExpertsAmerica.com > Whole Lot More for a Whole Lot LessC > $6/hr Professional Web Services http://www.WebExpertsAmerica.com > > Testimonials: > http://www.WebExpertsAmerica.com/testimonials.htm > > Website Solutions: http://www.WebExpertsAmerica.com/services.htm > > Chat: > WebExpertsNOW > AOL, MSN (Hotmail), and Yahoo > *Contact us anytime via chat. However, we DENY, BLOCK, and BAN anyone > that adds us to their Friend/Buddy list. Nothing personal, a security > policy to protect our chat connectivity from competitor abuse. > > Terms of Service: > http://www.WebExpertsAmerica.com/tos.htm > > Confidential: > The information contained in this message is privileged and confidential > and protected from disclosure. If the reader of this message is not the > intended recipient, you are hereby notified that any dissemination, > distribution or copying of this communication is strictly prohibited. If > you have received this communication in error, please notify us > immediately by replying to this message and then delete it from your > computer. > > > > -------- Original Message -------- > > Subject: RE: Your Nutch Crawler is Out of Control - Apache Notified > > From: "Wild Dancer" > > Date: Mon, September 26, 2005 11:18 pm > > To: > > Cc: > > > > Obviously, Web Experts have very bad UPload bandwidth. > > > > Frankly, classic installation of Apache with 150 "connections" will > > fail against 15 threads of Nutch, nothing related to a bandwidth, even > > > if it is 8Mbps/800kbps for home-based sites. > > > > May be Web Experts need to tune Apache Web Server, and use "worker" > > model instead of "pre-fork"? It allows to handle 6000 concurrent users > > > (1024 RAM)... It saves memory using threads instead of processes... > > > > > > -----Original Message----- > > From: WebExpertsAmerica [mailto:expert@WebExpertsAmerica.com] > > Sent: Friday, September 23, 2005 3:26 PM > > To: abuse@cac.washington.edu; noc@cac.washington.edu > > Cc: nutch-agent@lucene.apache.org > > Subject: Your Nutch Crawler is Out of Control - Apache Notified > > Importance: High > > > > > > > > You crawler is ignoring our robots.txt file. > > > > http://lucene.apache.org/nutch/bot.html; > > nutch-agent@lucene.apache.org)" 128.95.1.189 > > > > You are eating bandwidth at our domain in incredible amounts. This is > > rude. > > > > Please stop or we will be forced to block your IP and the crawler you > > are using. > > > > Best Regards, > > > > Web Experts America > > > > >>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<< > > WebExpertsAmerica.com > > Whole Lot More for a Whole Lot LessC > > $6/hr Professional Web Services http://www.WebExpertsAmerica.com > > > > Testimonials: http://www.WebExpertsAmerica.com/testimonials.htm > > > > Website Solutions: http://www.WebExpertsAmerica.com/services.htm > > > > Chat: > > WebExpertsNOW > > AOL, MSN (Hotmail), and Yahoo > > *Contact us anytime via chat. However, we DENY, BLOCK, and BAN anyone > > that adds us to their Friend/Buddy list. Nothing personal, a security > > policy to protect our chat connectivity from competitor abuse. > > > > Terms of Service: > > http://www.WebExpertsAmerica.com/tos.htm > > > > Confidential: > > The information contained in this message is privileged and > > confidential and protected from disclosure. If the reader of this > > message is not the intended recipient, you are hereby notified that > > any dissemination, distribution or copying of this communication is > > strictly prohibited. If you have received this communication in error, > > > please notify us immediately by replying to this message and then > > delete it from your computer.