nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>
Subject Re: nutch crawling file system SOLVED
Date Sun, 11 Mar 2012 16:59:21 GMT
Hi Alessio,

If you check out our official tutorial you will see no mention of
crawl-urlfilter, this was deprecated after Nutch 1.2 IIRC.

I can only suggest that any other tutorial you are using is in need of an
update.

http://wiki.apache.org/nutch/NutchTutorial

On Sat, Mar 10, 2012 at 4:42 PM, alessio crisantemi <
alessio.crisantemi@gmail.com> wrote:

> I'm partially solved.
> following the tutorial, I configured my nutch for crawl a local file
> system,
> thank you.
>
> But I have a duobt: why all tutorials and guide about nutch speak about
> crawl-urlfilter.txt' file, but the default config or Nutch don't have this
> file? But If I insert the code that the guide write for the crawl-urlfilter
> on regex-urlfilter, all works.
> I would know this case.
> thank you
> alessio
>
> Il giorno 04 marzo 2012 17:02, alessio crisantemi <
> alessio.crisantemi@gmail.com> ha scritto:
>
> > Hi all,
> > I need to crawl a directory with a lot of pdf file.
> > But I know onlye the step-by-step mode for crawl a website.
> > how can I do for a root?
> > thank you for help me
> > alessio
> >
>



-- 
*Lewis*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message