nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dennis Kubes <ku...@apache.org>
Subject Re: Retire the original Fetcher before the release?
Date Mon, 17 Mar 2008 14:36:59 GMT


Andrzej Bialecki wrote:
> Dennis Kubes wrote:
>> We continue to run on Fetcher1.
> 
> Since you're running large crawls, could you run one of them with 
> Fetcher2 and comment on the results? Note that Fetcher2 needs a lot 
> fewer threads than Fetcher - usually running a large crawl with < 100 
> threads is more than sufficient.

Excellent about time to run another large fetch so will try it.
> 
>>  What are the benefits of moving to Fetcher2.  Not opposed to it, just 
>> hadn't thought about it yet as Fetcher1 seemed to be working fine for us?
> 
> Politeness is implemented and enforced in Fetcher2 instead of in 
> protocol plugin. This means that the same blocking code can be reused 
> for any protocol (ftp, file, etc). Fetcher2 handles the "long tail" 
> problem in a better way - the old Fetcher would spin-wait threads until 
> the host becomes available, Fetcher2 reuses threads to handle work items 
> from other host queues. Fetcher2 follows a cleaner producer/consumer 
> model with per-host queues, which makes it more suitable for extensions. 
> Example: one of the extensions that I implemented in a private code was 
> to add host queue monitoring for rates of errors, types of errors, 
> download speed etc, and adjusting fetching parameters based on that. 
> Implementing this in the old Fetcher would be a nightmare.

Funny, we were trying to get this info by tailing all of the log files 
:).  I can definitely see how the new model could be much more efficient.

Dennis
> 
> 

Mime
View raw message