hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stuart Sierra" <m...@stuartsierra.com>
Subject Percent progress of map/reduce in JobClient
Date Wed, 04 Jun 2008 21:19:19 GMT
How does Hadoop decide when to update the "percent complete" for
map/reduce tasks?  I've been running a small job (~150 MB) on a
pseudo-distributed cluster.  "bin/hadoop jar" prints:

08/06/04 17:02:16 INFO mapred.JobClient:  map 0% reduce 0%
08/06/04 17:05:52 INFO mapred.JobClient:  map 100% reduce 0%
08/06/04 17:06:05 INFO mapred.JobClient:  map 100% reduce 66%
08/06/04 17:06:10 INFO mapred.JobClient:  map 100% reduce 67%
08/06/04 17:06:17 INFO mapred.JobClient:  map 100% reduce 68%

And so on until the job completes.  What seems odd is that I don't get
any feedback at all on the progress of the map task until it reaches
100%, and I get no feedback on the reduce task until it reaches 66%.
After that, I get updates every few seconds.  The TaskTracker shows
the same thing.  What might cause this?

This is Hadoop 0.17.  The input and output are both text, both ~140MB,
gzip-compressed down to ~12MB.


View raw message