nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lewis john mcgibbney <>
Subject Re: crawling a list of urls
Date Thu, 07 Jul 2011 15:21:36 GMT
Hi C.B.,

This is way to vague. We really require more information regarding roughly
what kind of results you wish to get. It would be a near impossible task for
anyone to try and specify a solution to this open ended question.

Please elaborate

Thank you

On Thu, Jul 7, 2011 at 12:56 PM, Cam Bazz <> wrote:

> Hello,
> I have a case where I need to crawl a list of exact url's. Somewhere
> in the range of 1 to 1.5M urls.
> I have written those urls in numereus files under /home/urls , ie
> /home/urls/1 /home/urls/2
> Then by using the crawl command I am crawling to depth=1
> Are there any recomendations or general guidelines that I should
> follow when making nutch just to fetch and index a list of urls?
> Best Regards,
> C.B.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message