nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>
Subject Re: Exception org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/nutch/1.4/runtime/local/crawl/segments/20111209174842/parse_data
Date Fri, 09 Dec 2011 12:11:38 GMT
Hi Riz,

Did you verify if Nutch is installed correctly?
http://wiki.apache.org/nutch/NutchTutorial#A2._Verify_your_Nutch_installation

if you have Nutch installed and correctly configured there should be no
problems running it in local mode as you are doing.

On Fri, Dec 9, 2011 at 7:40 AM, Muhammad Rizwan <
muhammad.rizwan@sigmatec.com.pk> wrote:

> Hi,
>
>
>
> I am new to Nutch and configured Nutch 1.4 using Tutorial here
> <
> http://wiki.apache.org/nutch/NutchTutorial#A1_Setup_Nutch_from_binary_distr
> ibution>  on my linux machine.
>
> Now when I run this command to crawl my first website
> # bin/nutch crawl urls -dir crawl -depth 3 -topN 5
>
>
>
> It starts working and after few seconds, I get following error
>
>
>
> Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException:
> Input path does not exist:
> file:/home/nutch/1.4/runtime/local/crawl/segments/20111209174842/parse_data
>
> Input path does not exist:
> file:/home/nutch/1.4/runtime/local/crawl/segments/20111209175156/parse_data
>
>        at
>
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:190
> )
>
>        at
>
> org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInpu
> tFormat.java:44)
>
>        at
>
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:201)
>
>        at
> org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
>
>        at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
>
>        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
>
>        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
>
>        at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:175)
>
>        at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:149)
>
>        at org.apache.nutch.crawl.Crawl.run(Crawl.java:143)
>
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>
>        at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
>
>
>
> Any idea, what going wrong here?
>
>
>
> - Riz
>
>


-- 
*Lewis*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message