Return-Path: X-Original-To: apmail-nutch-user-archive@www.apache.org Delivered-To: apmail-nutch-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E21D89EE3 for ; Fri, 3 Feb 2012 10:40:05 +0000 (UTC) Received: (qmail 69366 invoked by uid 500); 3 Feb 2012 10:40:04 -0000 Delivered-To: apmail-nutch-user-archive@nutch.apache.org Received: (qmail 69199 invoked by uid 500); 3 Feb 2012 10:39:57 -0000 Mailing-List: contact user-help@nutch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@nutch.apache.org Delivered-To: mailing list user@nutch.apache.org Received: (qmail 69187 invoked by uid 99); 3 Feb 2012 10:39:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Feb 2012 10:39:54 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of matt.headfirst@gmail.com designates 209.85.214.182 as permitted sender) Received: from [209.85.214.182] (HELO mail-tul01m020-f182.google.com) (209.85.214.182) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Feb 2012 10:39:47 +0000 Received: by obcwo16 with SMTP id wo16so7198746obc.27 for ; Fri, 03 Feb 2012 02:39:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=subject:references:from:content-type:x-mailer:in-reply-to :message-id:date:to:content-transfer-encoding:mime-version; bh=mxlSHypaSAyf7B3U8Q8jal6utVrxPdevfMIm01Tg/hU=; b=erULhITWCYZSPikJ/9xUFEjk4pVZmeNhVDdAvQNjaspuyX1vTgifFTWYJEiW2ReCW9 haND5pylV20UHgoI6UEmw3OYI22ODg5AiZdgGM2SSEAM1mZAwOoK4S1hEfJtE7qleG4K X3NEXo4qCesTYtMpd768UmYsiSaD9GQE4GTQE= Received: by 10.50.77.226 with SMTP id v2mr7686982igw.7.1328265566643; Fri, 03 Feb 2012 02:39:26 -0800 (PST) Received: from [192.168.2.11] (203-97-98-76.cable.telstraclear.net. [203.97.98.76]) by mx.google.com with ESMTPS id or2sm4427141igc.5.2012.02.03.02.39.25 (version=TLSv1/SSLv3 cipher=OTHER); Fri, 03 Feb 2012 02:39:26 -0800 (PST) Subject: Re: How parse *only* specific URLs under a domain... -depth 1 -topN 1 does not work as desired References: <23E8145A-A402-4F90-B0D4-426327B870AB@headfirst.co.nz> <201202031007.46308.markus.jelsma@openindex.io> From: Matt Poff Content-Type: multipart/alternative; boundary=Apple-Mail-16--81476116 X-Mailer: iPhone Mail (8L1) In-Reply-To: <201202031007.46308.markus.jelsma@openindex.io> Message-Id: <6CFAA333-399E-461A-B9A9-E99206C168E6@gmail.com> Date: Fri, 3 Feb 2012 23:39:14 +1300 To: "user@nutch.apache.org" Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (iPhone Mail 8L1) --Apple-Mail-16--81476116 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii >I'm not a nutch expert, but I would try to run a crawl with -depth 0. Tried that, but a depth of zero genetates no results at all. On 3/02/2012, at 10:07 PM, Markus Jelsma wrote:= > you can inject the url's you want and use the noAdditions switch when upda= ting=20 > the crawldb. Thanks - that sounds perfect.= --Apple-Mail-16--81476116--