nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marko Bauhardt ...@101tec.com>
Subject Re: Can I add a url to be crawled without putting it in a file and feeding it to "Inject"?
Date Thu, 06 Aug 2009 10:06:19 GMT

On Aug 5, 2009, at 6:57 PM, Paul Tomblin wrote:

Hi Paul

> I want to do some specific crawling where I crawl one site with one
> set of urls to accept/reject, then reset to crawl another site with
> another set of urls to accept/reject, etc.

i'm not sure if i understand what you mean. but if you want to crawl  
specific urls and maybe exclude some urls you can use the Black/White  
Url Filter.
http://issues.apache.org/jira/browse/NUTCH-249

Apply the patch "bw.patch" and read the comment how you can use it.


marko


Mime
View raw message