nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Muhammad Rizwan" <muhammad.riz...@sigmatec.com.pk>
Subject Exception org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/nutch/1.4/runtime/local/crawl/segments/20111209174842/parse_data
Date Fri, 09 Dec 2011 07:40:12 GMT
Hi,

 

I am new to Nutch and configured Nutch 1.4 using Tutorial here
<http://wiki.apache.org/nutch/NutchTutorial#A1_Setup_Nutch_from_binary_distr
ibution>  on my linux machine.

Now when I run this command to crawl my first website
# bin/nutch crawl urls -dir crawl -depth 3 -topN 5

 

It starts working and after few seconds, I get following error

 

Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException:
Input path does not exist:
file:/home/nutch/1.4/runtime/local/crawl/segments/20111209174842/parse_data

Input path does not exist:
file:/home/nutch/1.4/runtime/local/crawl/segments/20111209175156/parse_data

        at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:190
)

        at
org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInpu
tFormat.java:44)

        at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:201)

        at
org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)

        at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)

        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)

        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)

        at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:175)

        at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:149)

        at org.apache.nutch.crawl.Crawl.run(Crawl.java:143)

        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

        at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)

 

Any idea, what going wrong here?

 

- Riz


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message