Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: yarn-issues@hadoop.apache.org
Date: Wed, 28 May 2014 20:19:02 +0000 (UTC)
From: "Sandy Ryza (JIRA)" <jira@apache.org>
To: yarn-issues@hadoop.apache.org
Message-ID: <JIRA.12712927.1399492399557.33688.1401308342177@arcas>
In-Reply-To: <JIRA.12712927.1399492399557@arcas>
References: <JIRA.12712927.1399492399557@arcas>
Subject: [jira] [Commented] (YARN-2026) Fair scheduler : Fair share for
 inactive queues causes unfair allocation in some scenarios
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/YARN-2026?page=3Dcom.atlassian.=
jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D14011=
523#comment-14011523 ]=20

Sandy Ryza commented on YARN-2026:
----------------------------------

The nice thing about fair share currently is that it's interpretable as an =
amount of resources that, as long as you stay under, you won't get preempte=
d.   Changing it to depend on the running apps in the cluster severely comp=
licates this.  It used to be that each app and queue's fair share was min'd=
 with its resource usage+demand, which is sort of a continuous analog to wh=
at you're suggesting, but we moved to the current definition when we added =
multi-resource scheduling.

I'm wondering if the right way to solve this problem is to allow preemption=
 to be triggered at higher levels in the queue hierarchy.  I.e. suppose we =
have the following situation:
* root has two children - parentA and parentB
* each of root's children has two children - childA1, childA2, childB1, and=
 childB2
* the parent queues' minShares are each set to half of the cluster resource=
s
* the child queue' minShares are each set to a quarter of the cluster resou=
rces=20
* childA1 has a third of the cluster resources
* childB1 and childB2 each have a third of the cluster resources

Even though childA1 is above its fair/minShare, We would see that parentA i=
s below its minShare, so we would preempt resources on its behalf.  Once we=
 have YARN-596 in, these resources would end up coming from parentB, and en=
d up going to childA1.

> Fair scheduler : Fair share for inactive queues causes unfair allocation =
in some scenarios
> -------------------------------------------------------------------------=
-----------------
>
>                 Key: YARN-2026
>                 URL: https://issues.apache.org/jira/browse/YARN-2026
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: scheduler
>            Reporter: Ashwin Shankar
>            Assignee: Ashwin Shankar
>              Labels: scheduler
>         Attachments: YARN-2026-v1.txt
>
>
> While using hierarchical queues in fair scheduler,there are few scenarios=
 where we have seen a leaf queue with least fair share can take majority of=
 the cluster and starve a sibling parent queue which has greater weight/fai=
r share and preemption doesn=E2=80=99t kick in to reclaim resources.
> The root cause seems to be that fair share of a parent queue is distribut=
ed to all its children irrespective of whether its an active or an inactive=
(no apps running) queue. Preemption based on fair share kicks in only if th=
e usage of a queue is less than 50% of its fair share and if it has demands=
 greater than that. When there are many queues under a parent queue(with hi=
gh fair share),the child queue=E2=80=99s fair share becomes really low. As =
a result when only few of these child queues have apps running,they reach t=
heir *tiny* fair share quickly and preemption doesn=E2=80=99t happen even i=
f other leaf queues(non-sibling) are hogging the cluster.
> This can be solved by dividing fair share of parent queue only to active =
child queues.
> Here is an example describing the problem and proposed solution:
> root.lowPriorityQueue is a leaf queue with weight 2
> root.HighPriorityQueue is parent queue with weight 8
> root.HighPriorityQueue has 10 child leaf queues : root.HighPriorityQueue.=
childQ(1..10)
> Above config,results in root.HighPriorityQueue having 80% fair share
> and each of its ten child queue would have 8% fair share. Preemption woul=
d happen only if the child queue is <4% (0.5*8=3D4).=20
> Lets say at the moment no apps are running in any of the root.HighPriorit=
yQueue.childQ(1..10) and few apps are running in root.lowPriorityQueue whic=
h is taking up 95% of the cluster.
> Up till this point,the behavior of FS is correct.
> Now,lets say root.HighPriorityQueue.childQ1 got a big job which requires =
30% of the cluster. It would get only the available 5% in the cluster and p=
reemption wouldn't kick in since its above 4%(half fair share).This is bad =
considering childQ1 is under a highPriority parent queue which has *80% fai=
r share*.
> Until root.lowPriorityQueue starts relinquishing containers,we would see =
the following allocation on the scheduler page:
> *root.lowPriorityQueue =3D 95%*
> *root.HighPriorityQueue.childQ1=3D5%*
> This can be solved by distributing a parent=E2=80=99s fair share only to =
active queues.
> So in the example above,since childQ1 is the only active queue
> under root.HighPriorityQueue, it would get all its parent=E2=80=99s fair =
share i.e. 80%.
> This would cause preemption to reclaim the 30% needed by childQ1 from roo=
t.lowPriorityQueue after fairSharePreemptionTimeout seconds.
> Also note that similar situation can happen between root.HighPriorityQueu=
e.childQ1 and root.HighPriorityQueue.childQ2,if childQ2 hogs the cluster. c=
hildQ2 can take up 95% cluster and childQ1 would be stuck at 5%,until child=
Q2 starts relinquishing containers. We would like each of childQ1 and child=
Q2 to get half of root.HighPriorityQueue  fair share ie 40%,which would ens=
ure childQ1 gets upto 40% resource if needed through preemption.


--
This message was sent by Atlassian JIRA
(v6.2#6252)