Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 842D21033B for ; Wed, 28 May 2014 20:19:02 +0000 (UTC) Received: (qmail 46745 invoked by uid 500); 28 May 2014 20:19:02 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 46637 invoked by uid 500); 28 May 2014 20:19:02 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 46511 invoked by uid 99); 28 May 2014 20:19:02 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 May 2014 20:19:02 +0000 Date: Wed, 28 May 2014 20:19:02 +0000 (UTC) From: "Sandy Ryza (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-2026) Fair scheduler : Fair share for inactive queues causes unfair allocation in some scenarios MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-2026?page=3Dcom.atlassian.= jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D14011= 523#comment-14011523 ]=20 Sandy Ryza commented on YARN-2026: ---------------------------------- The nice thing about fair share currently is that it's interpretable as an = amount of resources that, as long as you stay under, you won't get preempte= d. Changing it to depend on the running apps in the cluster severely comp= licates this. It used to be that each app and queue's fair share was min'd= with its resource usage+demand, which is sort of a continuous analog to wh= at you're suggesting, but we moved to the current definition when we added = multi-resource scheduling. I'm wondering if the right way to solve this problem is to allow preemption= to be triggered at higher levels in the queue hierarchy. I.e. suppose we = have the following situation: * root has two children - parentA and parentB * each of root's children has two children - childA1, childA2, childB1, and= childB2 * the parent queues' minShares are each set to half of the cluster resource= s * the child queue' minShares are each set to a quarter of the cluster resou= rces=20 * childA1 has a third of the cluster resources * childB1 and childB2 each have a third of the cluster resources Even though childA1 is above its fair/minShare, We would see that parentA i= s below its minShare, so we would preempt resources on its behalf. Once we= have YARN-596 in, these resources would end up coming from parentB, and en= d up going to childA1. > Fair scheduler : Fair share for inactive queues causes unfair allocation = in some scenarios > -------------------------------------------------------------------------= ----------------- > > Key: YARN-2026 > URL: https://issues.apache.org/jira/browse/YARN-2026 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler > Reporter: Ashwin Shankar > Assignee: Ashwin Shankar > Labels: scheduler > Attachments: YARN-2026-v1.txt > > > While using hierarchical queues in fair scheduler,there are few scenarios= where we have seen a leaf queue with least fair share can take majority of= the cluster and starve a sibling parent queue which has greater weight/fai= r share and preemption doesn=E2=80=99t kick in to reclaim resources. > The root cause seems to be that fair share of a parent queue is distribut= ed to all its children irrespective of whether its an active or an inactive= (no apps running) queue. Preemption based on fair share kicks in only if th= e usage of a queue is less than 50% of its fair share and if it has demands= greater than that. When there are many queues under a parent queue(with hi= gh fair share),the child queue=E2=80=99s fair share becomes really low. As = a result when only few of these child queues have apps running,they reach t= heir *tiny* fair share quickly and preemption doesn=E2=80=99t happen even i= f other leaf queues(non-sibling) are hogging the cluster. > This can be solved by dividing fair share of parent queue only to active = child queues. > Here is an example describing the problem and proposed solution: > root.lowPriorityQueue is a leaf queue with weight 2 > root.HighPriorityQueue is parent queue with weight 8 > root.HighPriorityQueue has 10 child leaf queues : root.HighPriorityQueue.= childQ(1..10) > Above config,results in root.HighPriorityQueue having 80% fair share > and each of its ten child queue would have 8% fair share. Preemption woul= d happen only if the child queue is <4% (0.5*8=3D4).=20 > Lets say at the moment no apps are running in any of the root.HighPriorit= yQueue.childQ(1..10) and few apps are running in root.lowPriorityQueue whic= h is taking up 95% of the cluster. > Up till this point,the behavior of FS is correct. > Now,lets say root.HighPriorityQueue.childQ1 got a big job which requires = 30% of the cluster. It would get only the available 5% in the cluster and p= reemption wouldn't kick in since its above 4%(half fair share).This is bad = considering childQ1 is under a highPriority parent queue which has *80% fai= r share*. > Until root.lowPriorityQueue starts relinquishing containers,we would see = the following allocation on the scheduler page: > *root.lowPriorityQueue =3D 95%* > *root.HighPriorityQueue.childQ1=3D5%* > This can be solved by distributing a parent=E2=80=99s fair share only to = active queues. > So in the example above,since childQ1 is the only active queue > under root.HighPriorityQueue, it would get all its parent=E2=80=99s fair = share i.e. 80%. > This would cause preemption to reclaim the 30% needed by childQ1 from roo= t.lowPriorityQueue after fairSharePreemptionTimeout seconds. > Also note that similar situation can happen between root.HighPriorityQueu= e.childQ1 and root.HighPriorityQueue.childQ2,if childQ2 hogs the cluster. c= hildQ2 can take up 95% cluster and childQ1 would be stuck at 5%,until child= Q2 starts relinquishing containers. We would like each of childQ1 and child= Q2 to get half of root.HighPriorityQueue fair share ie 40%,which would ens= ure childQ1 gets upto 40% resource if needed through preemption. -- This message was sent by Atlassian JIRA (v6.2#6252)