nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From remi tassing <tassingr...@gmail.com>
Subject Re: Exception org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/nutch/1.4/runtime/local/crawl/segments/20111209174842/parse_data
Date Fri, 23 Dec 2011 14:13:06 GMT
It looks like it's working now, muchos gracias Markus!!!

On Fri, Dec 23, 2011 at 3:59 PM, Markus Jelsma
<markus.jelsma@openindex.io>wrote:

> yes, all segments/* that show errors. They are useless, only the
> crawl_generate subdir can be used again to restart the crawl from scratch.
>
> On Friday 23 December 2011 14:52:22 remi tassing wrote:
> > Just deleting the folders?
> >
> > On Fri, Dec 23, 2011 at 3:49 PM, Markus Jelsma
> >
> > <markus.jelsma@openindex.io>wrote:
> > > you have to get rid of the bad segments. they cannot be recovered. It
> is
> > > with
> > > Nutch 1.x never a good idea to use extremely large segments that take
> > > days to
> > > run.
> > >
> > > On Friday 23 December 2011 14:45:39 remi tassing wrote:
> > > > My computer shut down yesterday and I'm having the same problem. The
> > > > problem this time is that I can't just delete and re-started again.
> > > > I've been crawling for days!
> > > >
> > > > Any other ways to handle this? Remove segments? Sanitize the
> database?
> > > >
> > > > On Sat, Dec 10, 2011 at 3:54 PM, M.Rizwan
> > > >
> > > > <muhammad.rizwan@sigmatec.com.pk>wrote:
> > > > > Thanks Rami. Yes not a good solution but this worked for me too.
> > > > >
> > > > > Thanks for sharing.
> > > > >
> > > > > On Fri, Dec 9, 2011 at 5:13 PM, remi tassing <
> tassingremi@gmail.com>
> > > > >
> > > > > wrote:
> > > > > > Sorry, I forgot to change the title...
> > > > > >
> > > > > > However I had the same error "Exception
> > > > > > org.apache.hadoop.mapred.InvalidInputException: Input path does
> not
> > > > >
> > > > > exist:
> > > > > > file:/home/nutch/1.4/runtime/local/crawl/segments/..." this
> > > > > > morning.
> > > > > >
> > > > > > I believe it's because I stopped Nutch while it was crawling
and
> > > > > > data
> > > > >
> > > > > were
> > > > >
> > > > > > not saved properly.
> > > > > >
> > > > > > I couldn't find an alternative and just had to delete my "crawl"
> > > > > > folder, then it worked...Not a good solution!
> > > > > >
> > > > > > On Fri, Dec 9, 2011 at 2:08 PM, Lewis John Mcgibbney <
> > > > > >
> > > > > > lewis.mcgibbney@gmail.com> wrote:
> > > > > > > Hi Remi,
> > > > > > >
> > > > > > > Please don't hijack someone's thread, start your own.
> > > > > > >
> > > > > > > Thank you
> > > > > > >
> > > > > > > Lewis
> > > > > > >
> > > > > > > On Fri, Dec 9, 2011 at 8:26 AM, remi tassing <
> > >
> > > tassingremi@gmail.com>
> > >
> > > > > > > wrote:
> > > > > > > > Hello guys,
> > > > > > > >
> > > > > > > > how do you use "org.apache.nutch.net.URLFilterChecker"?
It's
> > > > > > > > not
> > > > > > >
> > > > > > > documented
> > > > > > >
> > > > > > > > and it always shows me this "Checking combination
of all
> > >
> > > URLFilters
> > >
> > > > > > > > available" and then gets stuck.
> > > > > > > >
> > > > > > > > Remi
> > > > > > >
> > > > > > > --
> > > > > > > *Lewis*
> > > > > >
> > > > > > --
> > > > > > Remi Tassing
> > >
> > > --
> > > Markus Jelsma - CTO - Openindex
>
> --
> Markus Jelsma - CTO - Openindex
>



-- 
Remi Tassing

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message