From user-return-22433-apmail-nutch-user-archive=nutch.apache.org@nutch.apache.org Wed Dec 7 22:45:36 2011 Return-Path: X-Original-To: apmail-nutch-user-archive@www.apache.org Delivered-To: apmail-nutch-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 87B3B7A04 for ; Wed, 7 Dec 2011 22:45:36 +0000 (UTC) Received: (qmail 5188 invoked by uid 500); 7 Dec 2011 22:45:35 -0000 Delivered-To: apmail-nutch-user-archive@nutch.apache.org Received: (qmail 5154 invoked by uid 500); 7 Dec 2011 22:45:35 -0000 Mailing-List: contact user-help@nutch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@nutch.apache.org Delivered-To: mailing list user@nutch.apache.org Received: (qmail 5145 invoked by uid 99); 7 Dec 2011 22:45:35 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Dec 2011 22:45:35 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of tim.pease@gmail.com designates 209.85.213.182 as permitted sender) Received: from [209.85.213.182] (HELO mail-yx0-f182.google.com) (209.85.213.182) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Dec 2011 22:45:26 +0000 Received: by yenl9 with SMTP id l9so1125553yen.27 for ; Wed, 07 Dec 2011 14:45:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to:x-mailer; bh=pNyz8SCgGxS5rC+1fmiDWCNS3PdhJHSIY/k7ZGZO084=; b=MsL+19q3jTBBoFAPPW1FdvLsdwERpae5nMMraz5fs1HqSueAl4m6IJl8ztLjlrVVtA pwwRPoEiW/BmvA632ird+G7ERm/y24EiAkep5nOXxrEoIuemvVTAtMUzt+zEcthoMJ5h dMYYf5xvQzI+TffoHhoUFN+6Aagrb00UtPiXc= Received: by 10.236.131.4 with SMTP id l4mr509367yhi.79.1323297905503; Wed, 07 Dec 2011 14:45:05 -0800 (PST) Received: from e3-2-1p.sat.rackspace.net (70-90-112-42-BusName-denver.co.hfc.comcastbusiness.net. [70.90.112.42]) by mx.google.com with ESMTPS id i22sm5492077yhm.10.2011.12.07.14.45.03 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 07 Dec 2011 14:45:04 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1084) Subject: Re: Trouble running solrindexer from Nutch 1.4 From: Tim Pease In-Reply-To: Date: Wed, 7 Dec 2011 15:45:01 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <8F1051DB-213D-4749-821A-FF1D38D2AF22@gmail.com> References: To: user@nutch.apache.org X-Mailer: Apple Mail (2.1084) X-Virus-Checked: Checked by ClamAV on apache.org On Dec 7, 2011, at 3:17 PM, Chip Calhoun wrote: > This is probably just down to my not waiting for a 1.4 tutorial, but = here goes. I've always used the following two commands to run my crawl = and then index to Solr: > # bin/nutch crawl urls -dir crawl -depth 1 -topN 500000 > # bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb = crawl/linkdb crawl/segments/* >=20 > In 1.3 that works great. But in 1.4, when I run Solrindex I get this: > # bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb = crawl/linkdb crawl/segments/* > SolrIndexer: starting at 2011-12-07 17:09:58 > org.apache.hadoop.mapred.InvalidInputException: Input path does not = exist: file: = /C:/apache/apache-nutch-1.4/runtime/local/crawl/linkdb/crawl_fetch > Input path does not exist: = file:/C:/apache/apache-nutch-1.4/runtime/local/crawl/linkdb/crawl_parse > Input path does not exist: = file:/C:/apache/apache-nutch-1.4/runtime/local/crawl/linkdb/parse_data > Input path does not exist: = file:/C:/apache/apache-nutch-1.4/runtime/local/crawl/linkdb/parse_text >=20 > Sure enough, those directories don't exist. But they didn't exist in = 1.3 either. What am I missing? >=20 The call signature for running the solrindex has changed. The linkdb is = now optional, so you need to denote it with a "-linkdb" flag on the = command line. bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb = crawl/linkdb crawl/segments/* Blessings, TwP=