hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xianyin Xin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4415) Scheduler Web Ui shows max capacity for the queue is 100% but when we submit application doesnt get assigned
Date Mon, 07 Dec 2015 07:27:11 GMT

    [ https://issues.apache.org/jira/browse/YARN-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044498#comment-15044498
] 

Xianyin Xin commented on YARN-4415:
-----------------------------------

Sorry for the late, [~Naganarasimha]. 
I don't know i understand correctly, so pls correct me if i'm wrong. Now there're two cases,
1), if we have set the access-labels for a queue in xml, and 2), we didnt set the access-labels
for a queue. For case 1), the access-labels and the configured capacities (0 for capacity
and 100 max by default) are imported, and for case 2), the access-labels of the queue is inherited
from its parent, but the capacities of the labels are 0 since {{setupConfigurableCapacities()}}
only considers the configured access-labels in xml.
{code}
    this.accessibleLabels =
        csContext.getConfiguration().getAccessibleNodeLabels(getQueuePath());
    this.defaultLabelExpression = csContext.getConfiguration()
        .getDefaultNodeLabelExpression(getQueuePath());

    // inherit from parent if labels not set
    if (this.accessibleLabels == null && parent != null) {
      this.accessibleLabels = parent.getAccessibleNodeLabels();
    }
    
    // inherit from parent if labels not set
    if (this.defaultLabelExpression == null && parent != null
        && this.accessibleLabels.containsAll(parent.getAccessibleNodeLabels())) {
      this.defaultLabelExpression = parent.getDefaultNodeLabelExpression();
    }

    // After we setup labels, we can setup capacities
    setupConfigurableCapacities();
{code}

This would cause confusion because the access-labels inherited from parent have 0 max capacities.
If the case is true, i agree that the inherited access-labels has 100 max capacities by default.

But for the two scenarios in the descrition, i feel the final result is reasonable because
you didnt set the access-labels for the queue and its parent doesn't have the access-labels
also, so the label is not accessable explicitly by the queue. But the info that the web ui
shows is wrong if the above analysis is right. i think the cause is from follow sentence in
{QueueCapacitiesInfo.java},

{code}
if (maxCapacity < CapacitySchedulerQueueInfo.EPSILON || maxCapacity > 1f)
        maxCapacity = 1f;
{code}
where it set the {{maxCapacity}} to 1 for case {{maxCapacity == 0}} which is just the case
2) above.

cc [~leftnoteasy].

> Scheduler Web Ui shows max capacity for the queue is 100% but when we submit application
doesnt get assigned
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-4415
>                 URL: https://issues.apache.org/jira/browse/YARN-4415
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: capacity scheduler, resourcemanager
>    Affects Versions: 2.7.2
>            Reporter: Naganarasimha G R
>            Assignee: Naganarasimha G R
>         Attachments: App info with diagnostics info.png, screenshot-1.png
>
>
> Steps to reproduce the issue :
> Scenario 1:
> # Configure a queue(default) with accessible node labels as *
> # create a exclusive partition *xxx* and map a NM to it
> # ensure no capacities are configured for default for label xxx
> # start an RM app with queue as default and label as xxx
> # application is stuck but scheduler ui shows 100% as max capacity for that queue
> Scenario 2:
> # create a nonexclusive partition *sharedPartition* and map a NM to it
> # ensure no capacities are configured for default queue
> # start an RM app with queue as *default* and label as *sharedPartition*
> # application is stuck but scheduler ui shows 100% as max capacity for that queue for
*sharedPartition*
> For both issues cause is the same default max capacity and abs max capacity is set to
Zero %



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message