nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Tomblin <ptomb...@xcski.com>
Subject Can I add a url to be crawled without putting it in a file and feeding it to "Inject"?
Date Wed, 05 Aug 2009 16:57:46 GMT
I want to do some specific crawling where I crawl one site with one
set of urls to accept/reject, then reset to crawl another site with
another set of urls to accept/reject, etc.  I'm writing my own wrapper
that sticks the urls to accept/reject into the Configuration and a
URLFilter that uses that configuration item to do the
accepting/rejecting, but I don't see how to make it start at a given
url other than making a dir/url file with that url in it.  In this
case that's inefficient and I'd rather just parse one file with a list
of urls and the accept/reject list for that url, then say "Inject this
url", then do my own generate/fetch/updatedb cycle, then inject the
next and repeat.

-- 
http://www.linkedin.com/in/paultomblin

Mime
View raw message