nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re: Fetch task 100% done, but still fetching
Date Fri, 11 Apr 2008 01:54:29 GMT
Aha, thank you.  The progress percentage is set early on.  Is that a good thing for both Nutch
and Hadoop in general?  If that progress is set to early on in the process, what happens when
you have a task that takes a *really* long time?  I suppose it's just a minor annoyance, since
one can always look at the completed/not completed bit to see what the real task status is.


Sematext -- -- Lucene - Solr - Nutch

----- Original Message ----
From: Andrzej Bialecki <>
Sent: Thursday, April 10, 2008 5:55:37 PM
Subject: Re: Fetch task 100% done, but still fetching

Dennis Kubes wrote:
> I believe the percentage complete is set in hadoop, in the 
> TaskInProgress.recomputeProgressMethod() and then lines 570-595 in 
> JobInProgress.updateTaskStatus.

Correct - this actually comes down the way from reading the current 
FileSplit, i.e. the part of the input fetchlist. When this reading 
process is completed, the percentage is set to 100%, even though a lot 
of URLs could be still queued and waiting to be fetched.

Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration  Contact: info at sigram dot com

View raw message