nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "RJ" <ryanfi...@sympatico.ca>
Subject Re: urlfilter-db usage
Date Thu, 01 Dec 2005 15:42:08 GMT
 Hi Brent,

     Start here;
       http://wiki.media-style.com/display/nutchDocu/quick+tutorial

      After urls are injected you only need to repeat the, Generate, Fetch,
Update and Index parts of the above tutorial.
      Re: Generate:
             Generate builds a new segment of uncrawled urls.

      That should get you started. I started testing Nutch about a week ago
so, if anyone wants to add anything, feel free.

  Regards

----- Original Message ----- 
From: "Brent Parker" <fbparker@comcast.net>
To: <nutch-user@lucene.apache.org>
Sent: Thursday, December 01, 2005 12:44 AM
Subject: urlfilter-db usage


> Greetings,
>
> I'm a Nutch (0.7.1) newbie.  I have installed it - used the Intranet
crawl,
> and all works fine. I want to crawl the web, using a relatively small list
> of domains. Therefore, I am interested in using the urlfilter-db plugin
> (http://issues.apache.org/jira/browse/NUTCH-100). I have downloaded the
> plugin. I was able to build and deploy with no problem. I set up the
> nutch-default.xml, nutch-site.xml, and mysql as specified in the plugin
> instructions. But how do I use (invoke) the plugin?
>
> I am using the tutorial (http://lucene.apache.org/nutch/tutorial.html) as
my
> guide to do whole-web crawling.  Do I now start from the "Whole-web:
> Fetching" section?
>
> Just need a "little" guidance (I think).
>
> Thanks in advance!
> Brent
>
>
> -- 
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.1.362 / Virus Database: 267.13.10/188 - Release Date:
29/11/2005
>
>



-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.362 / Virus Database: 267.13.10/189 - Release Date: 30/11/2005


Mime
View raw message