hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hemanth Yamijala (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3751) Assign tasktrackers more than one task per hearbeat
Date Fri, 11 Jul 2008 18:46:32 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12612956#action_12612956
] 

Hemanth Yamijala commented on HADOOP-3751:
------------------------------------------

Arun, isn't this the same as HADOOP-3136 ?

> Assign tasktrackers more than one task per hearbeat
> ---------------------------------------------------
>
>                 Key: HADOOP-3751
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3751
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Arun C Murthy
>
> Currently each TaskTracker gets one and only one new task to run per heartbeat. Also,
a TaskTracker immediately rushes to the JobTracker when a task completes without honouring
the heartbeat interval (default of 5s).
> The problem with this is multi-fold:
> 1. This is a utilization bottleneck, especially when the TaskTracker just starts up.
We should be assigning atleast 50% of it's capacity.
> 2. If the individual tasks are very short i.e. run for less than the heartbeat interval
the TaskTracker serially runs _one task at a time_.
> 3. For jobs with small maps, the TaskTracker never gets a chance to schedule reduces
till _all maps are complete_. This means shuffle doesn't overlap with maps at all, another
sore-point.
> Overall, the right approach is to let the TaskTracker advertise the number of available
map and reduce slots in each heartbeat and the JobTracker (i.e the Scheduler - HADOOP-3412/HADOOP-3445)
should decide how many tasks and which maps/reduces the TaskTracker should be assigned. Also,
we should ensure that the TaskTracker doesn't run to the JobTracker every-time a task completes
- maybe we should hard-limit to the heartbeat interval or maybe run to the JobTracker when
there are more than one completed tasks in a given heartbeat interval etc. 
> Lets discuss.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message