Return-Path: Delivered-To: apmail-incubator-nutch-user-archive@www.apache.org Received: (qmail 32041 invoked from network); 13 May 2005 17:54:52 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 13 May 2005 17:54:52 -0000 Received: (qmail 74465 invoked by uid 500); 13 May 2005 17:58:56 -0000 Mailing-List: contact nutch-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: nutch-user@incubator.apache.org Delivered-To: mailing list nutch-user@incubator.apache.org Received: (qmail 74448 invoked by uid 99); 13 May 2005 17:58:56 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=RCVD_BY_IP X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: domain of irnutch@gmail.com designates 64.233.184.203 as permitted sender) Received: from wproxy.gmail.com (HELO wproxy.gmail.com) (64.233.184.203) by apache.org (qpsmtpd/0.28) with ESMTP; Fri, 13 May 2005 10:58:56 -0700 Received: by wproxy.gmail.com with SMTP id 69so952305wri for ; Fri, 13 May 2005 10:54:31 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; b=h6hHbGjW5pQsrLPa6f8mqitVOjTLwOmrSXzvVovEyG6yevkQijC4xR4W2mnrOZ5fr34xCSpCL+W1Zk32o5onRzc8eOOPwQdaJkUIAdQ/4jG5f/WqOOBSXuHNEF3yoRAz5Eu41EenwMrxhe3CIWtDp2zX+llDsnbi6r9DOwXoWGo= Received: by 10.54.21.54 with SMTP id 54mr1806521wru; Fri, 13 May 2005 10:54:31 -0700 (PDT) Received: by 10.54.118.9 with HTTP; Fri, 13 May 2005 10:54:31 -0700 (PDT) Message-ID: Date: Fri, 13 May 2005 13:54:31 -0400 From: Ian Reardon Reply-To: Ian Reardon To: nutch-user@incubator.apache.org Subject: How does this sound Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline X-Virus-Checked: Checked X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N I am going to crawl a small set of sites and I never want to go off site and I also want to strictly control my link dept. I setup crawls for each site using the crawl command. Then manually move the segments folder to my "master" directory and re-index. (This can all be scripted). This gives me the flex ability to QA each individual crawl. Am I jumping through unnecessary hoops here or does this sound like a reasonable plan?