hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure
Date Mon, 28 Jul 2014 10:31:39 GMT

    [ https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076062#comment-14076062

Wangda Tan commented on YARN-2008:

Hi Craig,
As we discussed in YARN-1198, I think we should consider resource used by a queue's siblings
when computing headroom, I took a look at your patch again, some comments:

We first need think about how to calculate headroom in general, I think headroom is (concluded
from sub JIRAs of YARN-1198),
queue_available = min(clusterResource - used_by_sibling_of_parents - used_by_this_queue, queue_max_resource)
headroom = min(queue_available - available_resource_in_blacklisted_nodes, user_limit)
So I think this JIRA is focus on computing {{used_by_sibling_of_parents}}, is it?

I think the general appoarch looks good to me, except In CSQueueUtils.java, (will include
review of tests in next iteration):
      //sibling used is parent used - my used...
      float siblingUsedCapacity = Resources.ratio(
                 Resources.subtract(parent.getUsedResources(), queue.getUsedResources()),
It seems to me this computing not robust enough when parent resource is empty, no matter it's
an zero-capacity queue or sibling of it used 100% of cluster.
It's better to add an edge test case to prevent such zero-division as well.

It's better to explicitly cap {{return absoluteMaxAvail}} in range of \[0~1\] to prevent errors
float computation.


> CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure

> -----------------------------------------------------------------------------------------
>                 Key: YARN-2008
>                 URL: https://issues.apache.org/jira/browse/YARN-2008
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>    Affects Versions: 2.3.0
>            Reporter: Chen He
>            Assignee: Craig Welch
>         Attachments: YARN-2008.1.patch, YARN-2008.2.patch
> If there are two queues, both allowed to use 100% of the actual resources in the cluster.
Q1 and Q2 currently use 50% of actual cluster's resources and there is not actual space available.
If we use current method to get headroom, CapacityScheduler thinks there are still available
resources for users in Q1 but they have been used by Q2. 
> If the CapacityScheduelr has a hierarchy queue structure, it may report incorrect queueMaxCap.
Here is a example
>                              ||                    ||rootQueue||     ||
> |  |                               /                               |                
   \                     |
> |      L1ParentQueue1                      |          |            L1ParentQueue2   
> |  (allowed to use up 80% of its parent)    |  |         (allowed to use 20% in minimum
of its parent)|
> |                    /   |     \ |                            |  
> |  L2LeafQueue1 |    L2LeafQueue2 |  |     
> |(50% of its parent) |  (50% of its parent in minimum) |   |
> When we calculate headroom of a user in L2LeafQueue2, current method will think L2LeafQueue2
can use 40% (80%*50%) of actual rootQueue resources. However, without checking L1ParentQueue1,
we are not sure. It is possible that L1ParentQueue2 have used 40% of rootQueue resources right
now. Actually, L2LeafQueue2 can only use 30% (60%*50%). 

This message was sent by Atlassian JIRA

View raw message