Return-Path: Delivered-To: apmail-incubator-nutch-user-archive@www.apache.org Received: (qmail 42328 invoked from network); 13 May 2005 18:20:23 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 13 May 2005 18:20:23 -0000 Received: (qmail 33545 invoked by uid 500); 13 May 2005 18:24:45 -0000 Mailing-List: contact nutch-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: nutch-user@incubator.apache.org Delivered-To: mailing list nutch-user@incubator.apache.org Received: (qmail 33528 invoked by uid 99); 13 May 2005 18:24:44 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (hermes.apache.org: 209.226.175.97 is neither permitted nor denied by domain of emilijan@cpuedge.com) Received: from tomts40-srv.bellnexxia.net (HELO tomts40-srv.bellnexxia.net) (209.226.175.97) by apache.org (qpsmtpd/0.28) with ESMTP; Fri, 13 May 2005 11:24:44 -0700 Received: from [192.168.0.101] ([64.229.236.134]) by tomts40-srv.bellnexxia.net (InterMail vM.5.01.06.10 201-253-122-130-110-20040306) with ESMTP id <20050513182019.GAZ27737.tomts40-srv.bellnexxia.net@[192.168.0.101]> for ; Fri, 13 May 2005 14:20:19 -0400 Message-ID: <4284EFE6.2050109@cpuedge.com> Date: Fri, 13 May 2005 14:20:22 -0400 From: EM User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: en-us, en MIME-Version: 1.0 To: nutch-user@incubator.apache.org Subject: Re: How does this sound References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Sounds fine with me although more experience people here may have different opinion. One small thing, if you are setting up each site individually, then, fully disable the spidering. That way, you can inject individual sites by yourself. Good luck, Emilijan Ian Reardon wrote: >I am going to crawl a small set of sites and I never want to go off >site and I also want to strictly control my link dept. > >I setup crawls for each site using the crawl command. Then manually >move the segments folder to my "master" directory and re-index. (This >can all be scripted). This gives me the flex ability to QA each >individual crawl. > >Am I jumping through unnecessary hoops here or does this sound like a >reasonable plan? > >