hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3136) Assign multiple tasks per TaskTracker heartbeat
Date Wed, 13 Aug 2008 14:30:47 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12622221#action_12622221

Devaraj Das commented on HADOOP-3136:

I think that we should give less tasks of a particular type (map/reduce) when the corresponding
load of that type is less. So for e.g., if a TT asks for task(s) to run, and at this point
of time, we have 200 as the remaining-map-load and 1000 free map slots, we should give out
just 1 task to this TT. The logic in the patch would give out more than that (to be precise,
it would be the number of available slots on that TT), right? Also, should we consider the
"free" slots as opposed to the total number of slots in the calculation for maxMapLoad (or

> Assign multiple tasks per TaskTracker heartbeat
> -----------------------------------------------
>                 Key: HADOOP-3136
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3136
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Devaraj Das
>            Assignee: Arun C Murthy
>             Fix For: 0.19.0
>         Attachments: HADOOP-3136_0_20080805.patch, HADOOP-3136_1_20080809.patch
> In today's logic of finding a new task, we assign only one task per heartbeat.
> We probably could give the tasktracker multiple tasks subject to the max number of free
slots it has - for maps we could assign it data local tasks. We could probably run some logic
to decide what to give it if we run out of data local tasks (e.g., tasks from overloaded racks,
tasks that have least locality, etc.). In addition to maps, if it has reduce slots free, we
could give it reduce task(s) as well. Again for reduces we could probably run some logic to
give more tasks to nodes that are closer to nodes running most maps (assuming data generated
is proportional to the number of maps). For e.g., if rack1 has 70% of the input splits, and
we know that most maps are data/rack local, we try to schedule ~70% of the reducers there.
> Thoughts?

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message