From yarn-issues-return-3480-apmail-hadoop-yarn-issues-archive=hadoop.apache.org@hadoop.apache.org Thu Feb 7 06:31:17 2013 Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id ECBA5EA49 for ; Thu, 7 Feb 2013 06:31:16 +0000 (UTC) Received: (qmail 8956 invoked by uid 500); 7 Feb 2013 06:31:16 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 8848 invoked by uid 500); 7 Feb 2013 06:31:15 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 8812 invoked by uid 99); 7 Feb 2013 06:31:14 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Feb 2013 06:31:14 +0000 Date: Thu, 7 Feb 2013 06:31:14 +0000 (UTC) From: "Siddharth Seth (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-365) Each NM heartbeat should not generate and event for the Scheduler MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13573244#comment-13573244 ] Siddharth Seth commented on YARN-365: ------------------------------------- This isn't very different from configuring all nodes to have a higher heartbeat interval. With a high heartbeat interval, the NM would send a batch of updates over to the RM, and this heartbeat would trigger a scheduling pass. This change de-links RM scheduling passes from NM heartbeats. The NM can continue to provide node updates with a smaller interval, and the RM handles these, along with a scheduling pass, as and when it chooses to. In this particular case, the scheduler queue ends up with a single scheduling event per node - but will attempt a scheduling run only on the next heartbeat from that node. At a later point, the scheduling could be changed to be triggered by the arrival of a new application - or to just run in a tight loop. If the scheduler cannot keep up, it ends up scheduling as fast as it can - without node heartbeats affecting the queue size. Also, completed container information from heartbeats is processed earlier (instead of waiting for the event in the queue to be processed) - making each scheduler pass more efficient. bq. I can see cases where the all at once is actually worse as it will spend more time on a single heartbeat and potentially not get to other things in the queue like apps added as fast. The event should not be delayed more than the time required to complete one scheduling pass across all nodes. I don't think this will be much better in the case of a growing scheduler queue. bq. The only way I can see this being beneficial is if we can aggregate the heartbeats and have the scheduler process less. Do you mean somehow aggregating heartbeats across nodes ? This approach does aggregate heartbeats for a single node. > Each NM heartbeat should not generate and event for the Scheduler > ----------------------------------------------------------------- > > Key: YARN-365 > URL: https://issues.apache.org/jira/browse/YARN-365 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler > Affects Versions: 0.23.5 > Reporter: Siddharth Seth > Assignee: Xuan Gong > Attachments: Prototype2.txt, Prototype3.txt, YARN-365.1.patch, YARN-365.2.patch, YARN-365.3.patch > > > Follow up from YARN-275 > https://issues.apache.org/jira/secure/attachment/12567075/Prototype.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira