Return-Path: X-Original-To: apmail-nutch-user-archive@www.apache.org Delivered-To: apmail-nutch-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EE68470E5 for ; Thu, 14 Jul 2011 21:07:40 +0000 (UTC) Received: (qmail 46408 invoked by uid 500); 14 Jul 2011 21:07:40 -0000 Delivered-To: apmail-nutch-user-archive@nutch.apache.org Received: (qmail 46259 invoked by uid 500); 14 Jul 2011 21:07:39 -0000 Mailing-List: contact user-help@nutch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@nutch.apache.org Delivered-To: mailing list user@nutch.apache.org Received: (qmail 46244 invoked by uid 99); 14 Jul 2011 21:07:39 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Jul 2011 21:07:39 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of tim.pease@gmail.com designates 209.85.218.54 as permitted sender) Received: from [209.85.218.54] (HELO mail-yi0-f54.google.com) (209.85.218.54) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Jul 2011 21:07:31 +0000 Received: by yic13 with SMTP id 13so432001yic.27 for ; Thu, 14 Jul 2011 14:07:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=from:content-type:content-transfer-encoding:subject:date:message-id :to:mime-version:x-mailer; bh=BC/BQB13873/Pa65gk866dn9XVL3LMhop+sTDhXQT3w=; b=rogEv+FNKtp/z7aXpmkBQBiYYiopyMMArk8OA4r3yrtpjWVdbugZaBSutmBjUWLIHF UibVMPwL3KwuNgTsl1kWzpNFSY7Bg4kMKdhqZCIZRxfuKl9FXC2mQlJROrCBsqq1N1oh RFEBRLZ3D2Cctnwx5gzuLxfcuBdEj08yiST6U= Received: by 10.236.76.193 with SMTP id b41mr3602697yhe.71.1310677630777; Thu, 14 Jul 2011 14:07:10 -0700 (PDT) Received: from b8-1-1p.sat.rackspace.net (70-90-112-42-BusName-denver.co.hfc.comcastbusiness.net [70.90.112.42]) by mx.google.com with ESMTPS id c69sm529243yhm.15.2011.07.14.14.07.09 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 14 Jul 2011 14:07:10 -0700 (PDT) From: Tim Pease Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Subject: purging 404 URLs with SolrClean Date: Thu, 14 Jul 2011 15:07:08 -0600 Message-Id: <55A66D55-7E06-4377-B855-F03B1FCADCD8@gmail.com> To: user@nutch.apache.org Mime-Version: 1.0 (Apple Message framework v1084) X-Mailer: Apple Mail (2.1084) I've noticed that SolrClean does not mark URLs as purged from Solr. Will = running the SolrClean task multiple times send the same URLs to Solr for = deletion? If so, what is the best strategy to mark these documents in = the crawl DB so they are repeatedly deleted from Solr? Blessings, TwP=