hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-3751) Assign tasktrackers more than one task per hearbeat
Date Fri, 11 Jul 2008 17:34:31 GMT
Assign tasktrackers more than one task per hearbeat

                 Key: HADOOP-3751
                 URL: https://issues.apache.org/jira/browse/HADOOP-3751
             Project: Hadoop Core
          Issue Type: Improvement
          Components: mapred
            Reporter: Arun C Murthy

Currently each TaskTracker gets one and only one new task to run per heartbeat. Also, a TaskTracker
immediately rushes to the JobTracker when a task completes without honouring the heartbeat
interval (default of 5s).

The problem with this is multi-fold:
1. This is a utilization bottleneck, especially when the TaskTracker just starts up. We should
be assigning atleast 50% of it's capacity.
2. If the individual tasks are very short i.e. run for less than the heartbeat interval the
TaskTracker serially runs _one task at a time_.
3. For jobs with small maps, the TaskTracker never gets a chance to schedule reduces till
_all maps are complete_. This means shuffle doesn't overlap with maps at all, another sore-point.

Overall, the right approach is to let the TaskTracker advertise the number of available map
and reduce slots in each heartbeat and the JobTracker (i.e the Scheduler - HADOOP-3412/HADOOP-3445)
should decide how many tasks and which maps/reduces the TaskTracker should be assigned. Also,
we should ensure that the TaskTracker doesn't run to the JobTracker every-time a task completes
- maybe we should hard-limit to the heartbeat interval or maybe run to the JobTracker when
there are more than one completed tasks in a given heartbeat interval etc. 

Lets discuss.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message