This is probably just down to my not waiting for a 1.4 tutorial, but here goes. I've always
used the following two commands to run my crawl and then index to Solr:
# bin/nutch crawl urls -dir crawl -depth 1 -topN 500000
# bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb crawl/linkdb crawl/segments/*
In 1.3 that works great. But in 1.4, when I run Solrindex I get this:
# bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb crawl/linkdb crawl/segments/*
SolrIndexer: starting at 2011-12-07 17:09:58
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file: /C:/apache/apache-nutch-1.4/runtime/local/crawl/linkdb/crawl_fetch
Input path does not exist: file:/C:/apache/apache-nutch-1.4/runtime/local/crawl/linkdb/crawl_parse
Input path does not exist: file:/C:/apache/apache-nutch-1.4/runtime/local/crawl/linkdb/parse_data
Input path does not exist: file:/C:/apache/apache-nutch-1.4/runtime/local/crawl/linkdb/parse_text
Sure enough, those directories don't exist. But they didn't exist in 1.3 either. What am I
missing?
Thanks,
Chip
|