nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Reardon <irnu...@gmail.com>
Subject Crawl some sites
Date Tue, 10 May 2005 22:02:05 GMT
I would like to crawl some specific sites with nutch for content. I
will be physicaly looking for sites all the time and would like to add
them to my index on a regular basis.  So say I look around for sites
to crawl and say add 1 or 2 a week.  Can anyone psudo walk through
this with me?

I crawled some sites with nutch by creating a flat file of URL's and
then ran the crawl command, it created the directories/db's but I
tried to add a new site after the crawl but I got an error about
directory or DB already exists.  Do I have to recrawl all my content
every time I add something?? So say delete the folder, add the new
site to my flat file and crawl them all over again?  Thanks.

Mime
View raw message