Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 936A2200B6E for ; Sun, 7 Aug 2016 04:11:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 91F3B160A89; Sun, 7 Aug 2016 02:11:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id D9B9F160AAA for ; Sun, 7 Aug 2016 04:11:21 +0200 (CEST) Received: (qmail 43252 invoked by uid 500); 7 Aug 2016 02:11:20 -0000 Mailing-List: contact yarn-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-dev@hadoop.apache.org Received: (qmail 43129 invoked by uid 99); 7 Aug 2016 02:11:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 07 Aug 2016 02:11:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 889A82C029E for ; Sun, 7 Aug 2016 02:11:20 +0000 (UTC) Date: Sun, 7 Aug 2016 02:11:20 +0000 (UTC) From: "He Tianyi (JIRA)" To: yarn-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (YARN-5479) FairScheduler: Scheduling performance improvement MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Sun, 07 Aug 2016 02:11:22 -0000 He Tianyi created YARN-5479: ------------------------------- Summary: FairScheduler: Scheduling performance improvement Key: YARN-5479 URL: https://issues.apache.org/jira/browse/YARN-5479 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler, resourcemanager Affects Versions: 2.6.0 Reporter: He Tianyi Currently ResourceManager uses a single thread to handle async events for scheduling. As number of nodes grows, more events need to be processed in time in FairScheduler. Also, increased number of applications & queues slows down processing of each single event. There are two cases that slow processing of nodeUpdate events is problematic: A. global throughput is lower than number of nodes through heartbeat rounds. This keeps resource from being allocated since the inefficiency. B. global throughput meets the need, but for some of these rounds, events of some nodes cannot get processed before next heartbeat. This brings inefficiency handling burst requests (i.e. newly submitted MapReduce application cannot get its all task launched soon given enough resource). Pretty sure some people will encounter the problem eventually after a single cluster is scaled to several K of nodes (even with {{assignmultiple}} enabled). This issue proposes to perform several optimization towards performance in FairScheduler {{nodeUpdate}} method. To be specific: A. trading off fairness with efficiency, queue & app sorting can be skipped (or should this be called 'delayed sorting'?). we can either start another dedicated thread to do the sorting & updating, or actually perform sorting after current result have been used several times (say sort once in every 100 calls.) B. performing calculation on {{Resource}} instances is expensive, since at least 2 objects ({{ResourceImpl}} and its proto builder) is created each time (using 'immutable' apis). the overhead can be eliminated with a light-weighted implementation of Resource, which do not instantiate a builder until necessary, because most instances are used as intermediate result in scheduler instead of being exchanged via IPC. Also, {{createResource}} is using reflection, which can be replaced by a plain {{new}} (for scheduler usage only). furthermore, perhaps we could 'intern' resource to avoid allocation. C. other minor changes: such as move {{updateRootMetrics}} call to {{update}}, making root queue metrics eventual consistent (which may satisfies most of the needs). or introduce counters to {{getResourceUsage}} and make changing of resource incrementally instead of recalculate each time. With A and B, I was looking at 4 times improvement in a cluster with 2K nodes. Suggestions? Opinions? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-dev-help@hadoop.apache.org