hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Bean <jwfb...@cloudera.com>
Subject Re: When does Reduce job start
Date Tue, 04 Jan 2011 23:14:31 GMT
It's part of the design that reduce() does not get called until the map
phase is complete. You're seeing reduce report as started when map is at 90%
complete because hadoop is shuffling data from the mappers that have
completed. As currently designed, you can't prematurely start reduce()
because there is no way to gaurantee you have all the values for any key
until all the mappers are done. reduce() requires a key and all the values
for that key in order to execute.


On Tue, Jan 4, 2011 at 10:53 AM, sagar naik <snaik@attributor.com> wrote:

> Hi All,
> number  of map task: 1000s
> number of reduce task:single digit
> In such cases the reduce task wont  started even when few map task are
> completed.
> Example:
> In my observation of a sample run of bin/hadoop jar
> hadoop-*examples*.jar pi 10000 10, the reduce did not start untill 90%
> of map task were complete.
> The only reason, I can think of not starting  a reduce task is to
> avoid the un-necessary transfer of map output data in case of
> failures.
> Is there a way to quickly start the reduce task in such case ?
> Wht is the configuration param to change this behavior
> Thanks,
> Sagar

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message