hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrey Stepachev (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2151) FairScheduler option for global preemption within hierarchical queues
Date Sat, 14 Jun 2014 06:30:04 GMT

    [ https://issues.apache.org/jira/browse/YARN-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031488#comment-14031488

Andrey Stepachev commented on YARN-2151:

[~ashwinshankar77] , thank you for suggestions.

Proposed patch helps to solve problems in straight way. 
We have minShare for top level queues, why the same should not work
for hierarchy? What drawbacks here? It is not very useful
to calculate weights among 40+ queues with different min share 

Another issue with weight based solution is preemption. Weights
controls fair share, but preemption can be based on min share and 
more relaxed on fair share. With you workaround to mimic min share 
preemption, fair share timeout should be configured quite low, and it 
can't be reconfigured at all.
(right now fairsharetimeout is always global, don't know why).

So, what we will have with proposed patch:
1. It is quite small and it's behaviour can be disabled
2. It represents more natural way to control guaranteed share of the cluster
and allows scheduler to do it work to compute shares, especially in case of 
dominant share strategy (with weight you should stick with one dimension of you cluster)
3. Cluster can be configured for different preemption elasticity for different 
queue subtree (say for different type of workload or different departments)
4. With YARN-1864 and a couple custom rules it can be quite complicated
tree of queues in multi tenant clusters. min share helps to limit number of
containers to be run by map reduce simultaneously, and many users
want to create subqueues with custom min shares. With min shares not
propagated applications need to wait timeout to preempt containers.

With weight-based solution it is not so easy to use, and more over, it is bad things
can happen:
1. Weights should be precomputed and for big cluster it is not obvious,
on which of cpu/memory resource weight calculation should be taken.
2. Fairscheduler will use different paths to obey share, at first it will 
doesn't take into account min shares and allocate according to fair shares on
dominant resources of top queues. Later scheduler will inspect leaf queues 
and will find that particular queue  is under its min share (by particular resource,
can differ from resources used to calculate min share), scheduler will begin to 
preempt. So instead of clean calculation from the top, we will gen unnecessary
preemptions and apps starvations.

bq. a. Why don't you want to define minSharePreemptionTimeout in sub11 itself,since you are
anyway configuring it ?

this sub11 can be created by placement rules YARN-1864.
alternatively we can allow to specify queue parameters in 
queue placement rules, may be it will be much better solution, i admit.

bq. b. What if someone doesn't want minSharePreemptionTimeout to be inherited and would want
to use global default ?

place into another parent queue or as top level queue. right now there is not possibility
to override default parameters
for queues at all. and with provided patch solutions for overriding and not overriding are

> FairScheduler option for global preemption within hierarchical queues
> ---------------------------------------------------------------------
>                 Key: YARN-2151
>                 URL: https://issues.apache.org/jira/browse/YARN-2151
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: fairscheduler
>            Reporter: Andrey Stepachev
>         Attachments: YARN-2151.patch
> FairScheduler has hierarchical queues, but fair share calculation and 
> preemption still works withing a limited range and effectively still nonhierarchical.
> This patch solves this incompleteness in two aspects:
> 1. Currently MinShare is not propagated to upper queue, that leads to
> fair share calculation ignores all Min Shares in deeper queues. 
> Lets take an example
> (implemented as test case TestFairScheduler#testMinShareInHierarchicalQueues)
> {code}
> <?xml version="1.0"?>
> <allocations>
> <queue name="queue1">
>   <maxResources>10240mb, 10vcores</maxResources>
>   <queue name="big"/>
>   <queue name="sub1">
>     <schedulingPolicy>fair</schedulingPolicy>
>     <queue name="sub11">
>       <minResources>6192mb, 6vcores</minResources>
>     </queue>
>   </queue>
>   <queue name="sub2">
>   </queue>
> </queue>
> </allocations>
> {code}
> Then bigApp started within queue1.big with 10x1GB containers.
> That effectively eats all maximum allowed resources for queue1.
> Subsequent requests for app1 (queue1.sub1.sub11) and 
> app2 (queue1.sub2) (5x1GB each) will wait for free resources. 
> Take a note, that sub11 has min share requirements for 6x1GB.
> Without given patch fair share will be calculated with no knowledge 
> about min share requirements and app1 and app2 will get equal 
> number of containers.
> With the patch resources will split according to min share ( in test
> it will be 5 for app1 and 1 for app2)
> That behaviour controlled by the same parameter as ‘globalPreemtion’,
> but that can be changed easily.
> Implementation is a bit awkward, but seems that method for min share
> recalculation can be exposed as public or protected api and constructor
> in FSQueue can call it before using minShare getter. But right now
> current implementation with nulls should work too.
> 2. Preemption doesn’t works between queues on different level for the
> queues hierarchy. Moreover, it is not possible to override various 
> parameters for children queues. 
> This patch adds parameter ‘globalPreemption’, which enables global 
> preemption algorithm modifications.
> In a nutshell patch adds function shouldAttemptPreemption(queue),
> which can calculate usage for nested queues, and if queue with usage more 
> that specified threshold is found, preemption can be triggered.
> Aggregated minShare does the rest of work and preemption will work
> as expected within hierarchy of queues with different MinShare/MaxShare
> specifications on different levels.
> Test case TestFairScheduler#testGlobalPreemption depicts how it works.
> One big app gets resources above its fair share and app1 has a declared
> min share. On submission code finds that starvation and preempts enough
> containers to give enough room for app1.

This message was sent by Atlassian JIRA

View raw message