nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lewis john mcgibbney <lewis.mcgibb...@gmail.com>
Subject Re: Error Line...
Date Fri, 08 Jul 2011 23:40:53 GMT
Hi,

Please see this wiki page for the correct command line options [1], I'm
assuming that you are using the crawl command? Can you please confirm.

If not then please see this wiki page instead [2]

Well you are correct we are a bit think on URLFilter documentation, thanks
for raising this, it is now on the radar for a big update. However you can
see some info on how URLFilters work here [3].

You're final points have to do with settings in nutch-default.xml which you
are advised to copy to nutch-site.xml. You can tweak the properties by
looking at the fetch interval properties.

HTH

[1] http://wiki.apache.org/nutch/bin/nutch_crawl
[2] http://wiki.apache.org/nutch/bin/nutch%20solrindex
[3] http://wiki.apache.org/nutch/RegexURLFiltersBenchs



On Fri, Jul 8, 2011 at 11:00 PM, Cupbearer <jcraig@inforeverse.com> wrote:

> I saw this in one of the other posts here and this comes up for me also so
> I
> was wondering why and what I can do to fix it?
>
> solrUrl is not set, indexing will be skipped...
>
> I also need to work on the URL filter and the tutorial wasn't very clear on
> how the whole recrawl thing works and how you invoke that early if you
> don't
> want to wait 30 days and so forth?
>
> Thanks,
>
> -----
>
> Cupbearer
> Jerry E. Craig, Jr.
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Error-Line-tp3153435p3153435.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>



-- 
*Lewis*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message