Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id D74EE200B32 for ; Thu, 9 Jun 2016 04:47:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id D5B80160A35; Thu, 9 Jun 2016 02:47:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 296EB160A2E for ; Thu, 9 Jun 2016 04:47:22 +0200 (CEST) Received: (qmail 50059 invoked by uid 500); 9 Jun 2016 02:47:21 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 50040 invoked by uid 99); 9 Jun 2016 02:47:21 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Jun 2016 02:47:21 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 1715E2C1F6F for ; Thu, 9 Jun 2016 02:47:21 +0000 (UTC) Date: Thu, 9 Jun 2016 02:47:21 +0000 (UTC) From: "Inigo Goiri (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-5202) Dynamic Overcommit of Node Resources - POC MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 09 Jun 2016 02:47:23 -0000 [ https://issues.apache.org/jira/browse/YARN-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321810#comment-15321810 ] Inigo Goiri commented on YARN-5202: ----------------------------------- As mentioned in YARN-5215, I think this works fits pretty nicely within YARN-1011. [~jlowe], [~nroberts], do you guys have any issues on moving this work there? We could use most of this patch over there. For sure, all the UI stuff in this patch should be added to YARN-1011. > Dynamic Overcommit of Node Resources - POC > ------------------------------------------ > > Key: YARN-5202 > URL: https://issues.apache.org/jira/browse/YARN-5202 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager > Affects Versions: 3.0.0-alpha1 > Reporter: Nathan Roberts > Assignee: Nathan Roberts > Attachments: YARN-5202.patch > > > This Jira is to present a proof-of-concept implementation (collaboration between [~jlowe] and myself) of a dynamic over-commit implementation in YARN. The type of over-commit implemented in this jira is similar to but not as full-featured as what's being implemented via YARN-1011. YARN-1011 is where we see ourselves heading but we needed something quick and completely transparent so that we could test it at scale with our varying workloads (mainly MapReduce, Spark, and Tez). Doing so has shed some light on how much additional capacity we can achieve with over-commit approaches, and has fleshed out some of the problems these approaches will face. > Primary design goals: > - Avoid changing protocols, application frameworks, or core scheduler logic, - simply adjust individual nodes' available resources based on current node utilization and then let scheduler do what it normally does > - Over-commit slowly, pull back aggressively - If things are looking good and there is demand, slowly add resource. If memory starts to look over-utilized, aggressively reduce the amount of over-commit. > - Make sure the nodes protect themselves - i.e. if memory utilization on a node gets too high, preempt something - preferably something from a preemptable queue > A patch against trunk will be attached shortly. Some notes on the patch: > - This feature was originally developed against something akin to 2.7. Since the patch is mainly to explain the approach, we didn't do any sort of testing against trunk except for basic build and basic unit tests > - The key pieces of functionality are in {{SchedulerNode}}, {{AbstractYarnScheduler}}, and {{NodeResourceMonitorImpl}}. The remainder of the patch is mainly UI, Config, Metrics, Tests, and some minor code duplication (e.g. to optimize node resource changes we treat an over-commit resource change differently than an updateNodeResource change - i.e. remove_node/add_node is just too expensive for the frequency of over-commit changes) > - We only over-commit memory at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org