hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ying Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (YARN-4415) Scheduler Web Ui shows max capacity for the queue is 100% but when we submit application doesnt get assigned
Date Fri, 06 Jan 2017 10:15:58 GMT

    [ https://issues.apache.org/jira/browse/YARN-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803880#comment-15803880
] 

Ying Zhang edited comment on YARN-4415 at 1/6/17 10:15 AM:
-----------------------------------------------------------

Hi [~Naganarasimha], [~leftnoteasy], [~xinxianyin], we've encountered the same issue during
our test. Noticed that  this JIRA has been opened for a while. I understand the reason [~leftnoteasy]
and [~xinxianyin] have for choosing 0 or 100 as default max capacity value if not set. But
the current issue is we use 0 as default max capacity internally (using QueueCapacities.LABEL_DOESNT_EXIST_CAP)
when allocating resource but in RM Scheduler UI showing 100 as max capacity (due to the reason
class PartitionQueueCapacitiesInfo use 100 as default value for maxCapacity). Would we change
to use same default value here to avoid the inconsistency?
{quote}
But I think there's one thing we need to fix:
When queue.accessible-node-labels == *, QueueCapacitiesInfo#QueueCapacitiesInfo(QueueCapacities)
should call RMNodeLabelsManager.getClusterNodeLabelNames to get all labels instead of calling
getExistingNodeLabels. So after we add/remove labels, queue's capacities in webUI/REST response
will be updated as well.
{quote}
[~leftnoteasy], I'm not sure I understand what you mean, but it might be good that we keep
using getExistingNodeLabels so that only the node label partitions that the queue has access
to can be shown in RM Scheduler UI.


was (Author: ying zhang):
Hi [~Naganarasimha], [~leftnoteasy], [~xinxianyin], we've encountered the same issue during
our test. Noticed that  this JIRA has been opened for a while. I understand the reason [~leftnoteasy]
and [~xinxianyin] have for choosing 0 or 100 as default max capacity value if not set. But
the current issue is we use 0 as default max capacity internally (using macro CSQueueUtils.EPSILON)
when allocating resource but in RM Scheduler UI showing 100 as max capacity (due to the reason
class PartitionQueueCapacitiesInfo use 100 as default value in this case). Would we change
to use same default value here to avoid the inconsistency?
{quote}
But I think there's one thing we need to fix:
When queue.accessible-node-labels == *, QueueCapacitiesInfo#QueueCapacitiesInfo(QueueCapacities)
should call RMNodeLabelsManager.getClusterNodeLabelNames to get all labels instead of calling
getExistingNodeLabels. So after we add/remove labels, queue's capacities in webUI/REST response
will be updated as well.
{quote}
[~leftnoteasy], I'm not sure I understand what you mean, but it might be good that we keep
using getExistingNodeLabels so that only the node label partitions that the queue has access
to can be shown in RM Scheduler UI.

> Scheduler Web Ui shows max capacity for the queue is 100% but when we submit application
doesnt get assigned
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-4415
>                 URL: https://issues.apache.org/jira/browse/YARN-4415
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: capacity scheduler, resourcemanager
>    Affects Versions: 2.7.2
>            Reporter: Naganarasimha G R
>            Assignee: Naganarasimha G R
>         Attachments: App info with diagnostics info.png, capacity-scheduler.xml, screenshot-1.png
>
>
> Steps to reproduce the issue :
> Scenario 1:
> # Configure a queue(default) with accessible node labels as *
> # create a exclusive partition *xxx* and map a NM to it
> # ensure no capacities are configured for default for label xxx
> # start an RM app with queue as default and label as xxx
> # application is stuck but scheduler ui shows 100% as max capacity for that queue
> Scenario 2:
> # create a nonexclusive partition *sharedPartition* and map a NM to it
> # ensure no capacities are configured for default queue
> # start an RM app with queue as *default* and label as *sharedPartition*
> # application is stuck but scheduler ui shows 100% as max capacity for that queue for
*sharedPartition*
> For both issues cause is the same default max capacity and abs max capacity is set to
Zero %



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message