hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tanton Gibbs" <tanton.gi...@gmail.com>
Subject Re: Percent progress of map/reduce in JobClient
Date Wed, 04 Jun 2008 22:51:15 GMT
>From what I've read, there are three reduce phases 1. copy 2. sort 3. reduce
>From 0 - 33% is the copy phase.  I guess if you don't need that phase
it could skip this completely.
After 33%, it waits until it is done sorting before outputting status
again at 66%, then it updates regularly during the reduce phase to
100%.  This has been my experience, at least.

Tanton

On Wed, Jun 4, 2008 at 4:19 PM, Stuart Sierra <mail@stuartsierra.com> wrote:
> How does Hadoop decide when to update the "percent complete" for
> map/reduce tasks?  I've been running a small job (~150 MB) on a
> pseudo-distributed cluster.  "bin/hadoop jar" prints:
>
> 08/06/04 17:02:16 INFO mapred.JobClient:  map 0% reduce 0%
> 08/06/04 17:05:52 INFO mapred.JobClient:  map 100% reduce 0%
> 08/06/04 17:06:05 INFO mapred.JobClient:  map 100% reduce 66%
> 08/06/04 17:06:10 INFO mapred.JobClient:  map 100% reduce 67%
> 08/06/04 17:06:17 INFO mapred.JobClient:  map 100% reduce 68%
>
> And so on until the job completes.  What seems odd is that I don't get
> any feedback at all on the progress of the map task until it reaches
> 100%, and I get no feedback on the reduce task until it reaches 66%.
> After that, I get updates every few seconds.  The TaskTracker shows
> the same thing.  What might cause this?
>
> This is Hadoop 0.17.  The input and output are both text, both ~140MB,
> gzip-compressed down to ~12MB.
>
> Thanks,
> -Stuart
>

Mime
View raw message