hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Evans <ev...@yahoo-inc.com>
Subject Re: modify a specific TaskTracker's Map slots on the fly - FairScheduler?
Date Thu, 29 Sep 2011 15:30:06 GMT
You could increase the heartbeat frequency, although it is set to 3 sec by default, but may
be more for larger clusters so your data is likely to only be about 3 sec out of date.

--Bobby Evans

On 9/29/11 7:52 AM, "Ben Clay" <rbclay@ncsu.edu> wrote:

I need to modify the number of Map slots on a TaskTracker dynamically over the course of a
job without restarting the TaskTracker process.  If the number of allowed slots is lowered,
the current Map tasks should be allowed to finish, while new tasks should be prevented until
the count falls below the limit.

Conceptually, a custom scheduler should work for this, and I have a modified FairScheduler
working which allows me to "turn off" TaskTrackers, disallowing new Map task assignments.
 To do so, the target TaskTracker's hostname is placed in a refreshable config file, and canAssignMap()
always returns false for that hostname.

The problem is that when I want to raise / lower the Map slots to some value in between 0
and mapred.tasktracker.map.tasks.maximum, I need to know the currently-held number of Map
tasks.  However, I can't get accurate information about the current number of slots occupied.
 The following function calls all appear to return "stale" info:


I've concluded these are stale because I can see multiple quick-succession calls to canAssignMap()
yield the same value for these counter functions, even though new tasks have been assigned.
 I thought about keeping track of the number of assigned tasks within canAssignMap() itself,
but there is unfortunately no way to tell when tasks have been completed, making this moot.

Is there another approach that would work in this situation?  It doesn't have to be via the
FairScheduler.  OR, is there a way to speed up the frequency of TaskTracker reports, so that
my scheduler has semi-accurate slot info?



View raw message