nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Pease <tim.pe...@gmail.com>
Subject Re: Trouble running solrindexer from Nutch 1.4
Date Wed, 07 Dec 2011 22:45:01 GMT

On Dec 7, 2011, at 3:17 PM, Chip Calhoun wrote:

> This is probably just down to my not waiting for a 1.4 tutorial, but here goes. I've
always used the following two commands to run my crawl and then index to Solr:
> # bin/nutch crawl urls -dir crawl -depth 1 -topN 500000
> # bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb crawl/linkdb crawl/segments/*
> 
> In 1.3 that works great. But in 1.4, when I run Solrindex I get this:
> # bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb crawl/linkdb crawl/segments/*
> SolrIndexer: starting at 2011-12-07 17:09:58
> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file: /C:/apache/apache-nutch-1.4/runtime/local/crawl/linkdb/crawl_fetch
> Input path does not exist: file:/C:/apache/apache-nutch-1.4/runtime/local/crawl/linkdb/crawl_parse
> Input path does not exist: file:/C:/apache/apache-nutch-1.4/runtime/local/crawl/linkdb/parse_data
> Input path does not exist: file:/C:/apache/apache-nutch-1.4/runtime/local/crawl/linkdb/parse_text
> 
> Sure enough, those directories don't exist. But they didn't exist in 1.3 either. What
am I missing?
> 

The call signature for running the solrindex has changed. The linkdb is now optional, so you
need to denote it with a "-linkdb" flag on the command line.

bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb crawl/linkdb crawl/segments/*

Blessings,
TwP
Mime
View raw message