nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lewis john mcgibbney <>
Subject Re: Error Line...
Date Fri, 08 Jul 2011 23:40:53 GMT

Please see this wiki page for the correct command line options [1], I'm
assuming that you are using the crawl command? Can you please confirm.

If not then please see this wiki page instead [2]

Well you are correct we are a bit think on URLFilter documentation, thanks
for raising this, it is now on the radar for a big update. However you can
see some info on how URLFilters work here [3].

You're final points have to do with settings in nutch-default.xml which you
are advised to copy to nutch-site.xml. You can tweak the properties by
looking at the fetch interval properties.



On Fri, Jul 8, 2011 at 11:00 PM, Cupbearer <> wrote:

> I saw this in one of the other posts here and this comes up for me also so
> I
> was wondering why and what I can do to fix it?
> solrUrl is not set, indexing will be skipped...
> I also need to work on the URL filter and the tutorial wasn't very clear on
> how the whole recrawl thing works and how you invoke that early if you
> don't
> want to wait 30 days and so forth?
> Thanks,
> -----
> Cupbearer
> Jerry E. Craig, Jr.
> --
> View this message in context:
> Sent from the Nutch - User mailing list archive at


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message