hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (HADOOP-465) Jobtracker doesn't always spread reduce tasks evenly if (mapred.tasktracker.tasks.maximum > 1)
Date Tue, 22 Aug 2006 17:52:15 GMT
     [ http://issues.apache.org/jira/browse/HADOOP-465?page=all ]

Doug Cutting resolved HADOOP-465.
---------------------------------

    Fix Version/s: 0.6.0
       Resolution: Duplicate

This was fixed in HADOOP-400.

> Jobtracker doesn't always spread reduce tasks evenly if (mapred.tasktracker.tasks.maximum
> 1)
> ----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-465
>                 URL: http://issues.apache.org/jira/browse/HADOOP-465
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Chris Schneider
>            Priority: Minor
>             Fix For: 0.6.0
>
>
> I note that (at least for Nutch 0.8 Generator.Selector.reduce) if mapred.reduce.tasks
is the same as the number of tasktrackers, and mapred.tasktracker.tasks.maximum is left at
the default of 2, I typically have no reduce tasks running on a few of my tasktrackers, and
two reduce tasks running on the same number of other tasktrackers.
> It seems like the jobtracker should assign reduce tasks to tasktrackers in a round robin
fashion, so that the distribution will be spread as evenly as possible. The current implementation
would seem to waste at least some time if one or more slave machines have to execute two reduce
tasks simultaneously while other tasktrackers sit idle, with the amount of wasted time depending
on how dependent the reduce tasks were on the slave machine's resources.
> I first thought that perhaps the jobtracker was "overloading" the tasktrackers that had
already finished their map tasks (and avoiding those that were still mapping). However, as
I understand it, the reduce tasks are all launched at the beginning of the job so that they
are all ready and waiting for map output data when it first appears.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message