Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9D12B9302 for ; Thu, 29 Sep 2011 16:52:15 +0000 (UTC) Received: (qmail 33375 invoked by uid 500); 29 Sep 2011 16:52:14 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 33336 invoked by uid 500); 29 Sep 2011 16:52:14 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 33327 invoked by uid 99); 29 Sep 2011 16:52:14 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Sep 2011 16:52:14 +0000 X-ASF-Spam-Status: No, hits=3.3 required=5.0 tests=HTML_MESSAGE,NO_RDNS_DOTCOM_HELO,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: 216.145.54.171 is neither permitted nor denied by domain of evans@yahoo-inc.com) Received: from [216.145.54.171] (HELO mrout1.yahoo.com) (216.145.54.171) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Sep 2011 16:52:06 +0000 Received: from SP1-EX07CAS02.ds.corp.yahoo.com (sp1-ex07cas02.ds.corp.yahoo.com [216.252.116.138]) by mrout1.yahoo.com (8.14.4/8.14.4/y.out) with ESMTP id p8TGosNq027392 for ; Thu, 29 Sep 2011 09:50:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=yahoo-inc.com; s=cobra; t=1317315054; bh=zutPJF8XMQNSAqzgATQylJpMGVnAskNbH2Zc/57fBJE=; h=From:To:Date:Subject:Message-ID:In-Reply-To:Content-Type: MIME-Version; b=FiSFXMFk7LeNvAc86JKOTkTDoNREH/pvHjTp0h6g3Wc5OxYjVsk78vLo+Bo5npo3+ k83IS1ccyKufEZGh1Wy4PZJ5tscfwmfqD1G8aiZ11xSPOpNlKuCQ7HJPzhSlYENN1K s5mweWQrwoetSBnaoVQGx3/34XWxc6nMZztNk2Xw= Received: from SP1-EX07VS02.ds.corp.yahoo.com ([216.252.116.135]) by SP1-EX07CAS02.ds.corp.yahoo.com ([216.252.116.167]) with mapi; Thu, 29 Sep 2011 09:50:54 -0700 From: Robert Evans To: "mapreduce-user@hadoop.apache.org" Date: Thu, 29 Sep 2011 09:50:52 -0700 Subject: Re: modify a specific TaskTracker's Map slots on the fly - FairScheduler? Thread-Topic: modify a specific TaskTracker's Map slots on the fly - FairScheduler? Thread-Index: AQHhxSZkpzlmzcKJmyfnJwOYJ9VgB5U5+0kggAAKT8c= Message-ID: In-Reply-To: <010c01cc7ec3$146975c0$3d3c6140$@ncsu.edu> Accept-Language: en-US Content-Language: en X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_CAAA0C1C29E53evansyahooinccom_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_CAAA0C1C29E53evansyahooinccom_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Ben, I am not completely sure how it all works, but mapred.heartbeats.in.second = is the config. It controls how many heartbeats the JT will be able to proc= ess in a second from all nodes. I think the 3 sec default is actually for = the NN/DN not the JT. So the number of hearbeats per second a node sends t= o the JT is mapred.heartbeats.in.second/# Nodes. --Bobby Evans On 9/29/11 11:16 AM, "Ben Clay" wrote: Bobby- Is there a config option for reducing the heartbeat frequency? I think I m= ay have a potential solution, but decreasing the heartbeat would reduce the= upper bound on task launch delays. -Ben From: Robert Evans [mailto:evans@yahoo-inc.com] Sent: Thursday, September 29, 2011 11:30 AM To: mapreduce-user@hadoop.apache.org Subject: Re: modify a specific TaskTracker's Map slots on the fly - FairSch= eduler? You could increase the heartbeat frequency, although it is set to 3 sec by = default, but may be more for larger clusters so your data is likely to only= be about 3 sec out of date. --Bobby Evans On 9/29/11 7:52 AM, "Ben Clay" wrote: I need to modify the number of Map slots on a TaskTracker dynamically over = the course of a job without restarting the TaskTracker process. If the num= ber of allowed slots is lowered, the current Map tasks should be allowed to= finish, while new tasks should be prevented until the count falls below th= e limit. Conceptually, a custom scheduler should work for this, and I have a modifie= d FairScheduler working which allows me to "turn off" TaskTrackers, disallo= wing new Map task assignments. To do so, the target TaskTracker's hostname= is placed in a refreshable config file, and canAssignMap() always returns = false for that hostname. The problem is that when I want to raise / lower the Map slots to some valu= e in between 0 and mapred.tasktracker.map.tasks.maximum, I need to know the= currently-held number of Map tasks. However, I can't get accurate informa= tion about the current number of slots occupied. The following function ca= lls all appear to return "stale" info: TaskTrackerStatus.countOccupiedMapSlots() TaskTrackerStatus.getAvailableMapSlots() TaskTrackerStatus.countMapTasks() I've concluded these are stale because I can see multiple quick-succession = calls to canAssignMap() yield the same value for these counter functions, e= ven though new tasks have been assigned. I thought about keeping track of = the number of assigned tasks within canAssignMap() itself, but there is unf= ortunately no way to tell when tasks have been completed, making this moot. Is there another approach that would work in this situation? It doesn't ha= ve to be via the FairScheduler. OR, is there a way to speed up the frequen= cy of TaskTracker reports, so that my scheduler has semi-accurate slot info= ? Thanks! -Ben --_000_CAAA0C1C29E53evansyahooinccom_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Re: modify a specific TaskTracker's Map slots on the fly - FairSched= uler? Ben,

I am not completely sure how it all works, but mapred.heartbeats.in.second = is the config.  It controls how many heartbeats the JT will be able to= process in a second from all nodes.  I think the 3 sec default is act= ually for the NN/DN not the JT.  So the number of hearbeats per second= a node sends to the JT is mapred.heartbeats.in.second/# Nodes.

--Bobby Evans

On 9/29/11 11:16 AM, "Ben Clay" <r= bclay@ncsu.edu> wrote:

Bobby-
 
Is there a config option for reducing the heartbeat frequency?  I thin= k I may have a potential solution, but decreasing the heartbeat would reduc= e the upper bound on task launch delays.
 

-Ben
 

From: Robert = Evans [mailto:evans@yahoo-inc.com]
Sent: Thursday, September 29, 2011 11:30 AM
To:
mapreduce-user@hado= op.apache.org
Subject: Re: modify a specific TaskTracker's Map slots on the fly - = FairScheduler?

You could increase the heartbeat frequency, although it= is set to 3 sec by default, but may be more for larger clusters so your da= ta is likely to only be about 3 sec out of date.

--Bobby Evans

On 9/29/11 7:52 AM, "Ben Clay" <rb= clay@ncsu.edu> wrote:
I need to modify the number of Map slots on a TaskTracker dynamically over = the course of a job without restarting the TaskTracker process.  If th= e number of allowed slots is lowered, the current Map tasks should be allow= ed to finish, while new tasks should be prevented until the count falls bel= ow the limit.
 
Conceptually, a custom scheduler should work for this, and I have a modifie= d FairScheduler working which allows me to “turn off” TaskTrack= ers, disallowing new Map task assignments.  To do so, the target TaskT= racker’s hostname is placed in a refreshable config file, and canAssi= gnMap() always returns false for that hostname.
 
The problem is that when I want to raise / lower the Map slots to some valu= e in between 0 and mapred.tasktracker.map.tasks.maximum, I need to know the= currently-held number of Map tasks.  However, I can’t get accur= ate information about the current number of slots occupied.  The follo= wing function calls all appear to return “stale” info:
 
TaskTrackerStatus.countOccupiedMapSlots()
TaskTrackerStatus.getAvailableMapSlots()
TaskTrackerStatus.countMapTasks()
 
I’ve concluded these are stale because I can see multiple quick-succe= ssion calls to canAssignMap() yield the same value for these counter functi= ons, even though new tasks have been assigned.  I thought about keepin= g track of the number of assigned tasks within canAssignMap() itself, but t= here is unfortunately no way to tell when tasks have been completed, making= this moot.
 
Is there another approach that would work in this situation?  It doesn= ’t have to be via the FairScheduler.  OR, is there a way to spee= d up the frequency of TaskTracker reports, so that my scheduler has semi-ac= curate slot info?
 
Thanks!
 
-Ben
 

--_000_CAAA0C1C29E53evansyahooinccom_--