hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Kambatla (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4730) YARN preemption based on instantaneous fair share
Date Wed, 24 Feb 2016 21:58:18 GMT

    [ https://issues.apache.org/jira/browse/YARN-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163873#comment-15163873
] 

Karthik Kambatla commented on YARN-4730:
----------------------------------------

IIRR, FairScheduler preemption is based on instantaneous fairshare. The steady fairshare is
used only for WebUI purposes. 

In your case, I would think minshare preemption kicks in because you specify min resources
for all queues. Isn't it expected that all queues are getting the same resources the sum of
which is cluster resources? Do you expect allocations different from minshare? 

> YARN preemption based on instantaneous fair share
> -------------------------------------------------
>
>                 Key: YARN-4730
>                 URL: https://issues.apache.org/jira/browse/YARN-4730
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler
>            Reporter: Prabhu Joseph
>
> On a big cluster with Total Cluster Resource of 10TB, 3000 cores and Fair Sheduler having
230 queues and total 60000 jobs run a day. [ all 230 queues are very critical and hence the
minResource is same for all]. On this case, when a Spark Job is run on queue A and which occupies
the entire cluster resource and does not release any resource, another job submitted into
queue B and preemption is getting only the Fair Share which is <10TB , 3000> / 230 =
<45 GB , 13 cores> which is very less fair share for a queue.shared by many applications.

> The Preemption should get the instantaneous fair Share, that is <10TB, 3000> /
2 (active queues) = 5TB and 1500 cores, so that the first job won't hog the entire cluster
resource and also the subsequent jobs run fine.
> This issue is only when the number of queues are very high. In case of less number of
queues, Preemption getting Fair Share would be suffice as the fair share will be high. But
in case of too many number of queues, Preemption should try to get the instantaneous Fair
Share.
> Note: Configuring optimal maxResources to 230 queues is difficult and also putting constraint
for the queues using maxResource will leave  cluster resource idle most of the time.
>         There are 1000s of Spark Jobs, so asking each user to restrict the number of
executors is also difficult.
> Preempting Instantaneous Fair Share will help to overcome the above issues.
>           



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message