On Dec 7, 2011, at 3:17 PM, Chip Calhoun wrote:
> This is probably just down to my not waiting for a 1.4 tutorial, but here goes. I've
always used the following two commands to run my crawl and then index to Solr:
> # bin/nutch crawl urls -dir crawl -depth 1 -topN 500000
> # bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb crawl/linkdb crawl/segments/*
>
> In 1.3 that works great. But in 1.4, when I run Solrindex I get this:
> # bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb crawl/linkdb crawl/segments/*
> SolrIndexer: starting at 2011-12-07 17:09:58
> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file: /C:/apache/apache-nutch-1.4/runtime/local/crawl/linkdb/crawl_fetch
> Input path does not exist: file:/C:/apache/apache-nutch-1.4/runtime/local/crawl/linkdb/crawl_parse
> Input path does not exist: file:/C:/apache/apache-nutch-1.4/runtime/local/crawl/linkdb/parse_data
> Input path does not exist: file:/C:/apache/apache-nutch-1.4/runtime/local/crawl/linkdb/parse_text
>
> Sure enough, those directories don't exist. But they didn't exist in 1.3 either. What
am I missing?
>
The call signature for running the solrindex has changed. The linkdb is now optional, so you
need to denote it with a "-linkdb" flag on the command line.
bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb crawl/linkdb crawl/segments/*
Blessings,
TwP
|