Hi Riz,
Did you verify if Nutch is installed correctly?
http://wiki.apache.org/nutch/NutchTutorial#A2._Verify_your_Nutch_installation
if you have Nutch installed and correctly configured there should be no
problems running it in local mode as you are doing.
On Fri, Dec 9, 2011 at 7:40 AM, Muhammad Rizwan <
muhammad.rizwan@sigmatec.com.pk> wrote:
> Hi,
>
>
>
> I am new to Nutch and configured Nutch 1.4 using Tutorial here
> <
> http://wiki.apache.org/nutch/NutchTutorial#A1_Setup_Nutch_from_binary_distr
> ibution> on my linux machine.
>
> Now when I run this command to crawl my first website
> # bin/nutch crawl urls -dir crawl -depth 3 -topN 5
>
>
>
> It starts working and after few seconds, I get following error
>
>
>
> Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException:
> Input path does not exist:
> file:/home/nutch/1.4/runtime/local/crawl/segments/20111209174842/parse_data
>
> Input path does not exist:
> file:/home/nutch/1.4/runtime/local/crawl/segments/20111209175156/parse_data
>
> at
>
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:190
> )
>
> at
>
> org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInpu
> tFormat.java:44)
>
> at
>
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:201)
>
> at
> org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
>
> at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
>
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
>
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
>
> at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:175)
>
> at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:149)
>
> at org.apache.nutch.crawl.Crawl.run(Crawl.java:143)
>
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>
> at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
>
>
>
> Any idea, what going wrong here?
>
>
>
> - Riz
>
>
--
*Lewis*
|