hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Kambatla (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5077) Fix FSLeafQueue#getFairShare() for queues with weight 0.0
Date Sun, 12 Jun 2016 21:01:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15326629#comment-15326629
] 

Karthik Kambatla commented on YARN-5077:
----------------------------------------

Interesting approach on the last patch. 

Few comments:
# Can we extend it to address YARN-4866 as well, so we have a uniform approach? 
# Instead of checking for weight, we might want to check if fairshare memory/cpu being 0.
That way, we will also address cases where the weight is really small due to which the fairshare
is essentially 0.
# FSQueue#getMaxShare appears to be not checking the parent queues. Shouldn't we be checking
that? FWIW, I am not a fan of our current approach of querying AllocationConfiguration. Will
it be better to use FSQueue to store queue-specific information instead? I am comfortable
with tackling that in another JIRA either before or immediately after this. 


> Fix FSLeafQueue#getFairShare() for queues with weight 0.0
> ---------------------------------------------------------
>
>                 Key: YARN-5077
>                 URL: https://issues.apache.org/jira/browse/YARN-5077
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Yufei Gu
>            Assignee: Yufei Gu
>         Attachments: YARN-5077.001.patch, YARN-5077.002.patch, YARN-5077.003.patch, YARN-5077.004.patch,
YARN-5077.005.patch, YARN-5077.006.patch, YARN-5077.007.patch
>
>
> 1) When a queue's weight is set to 0.0, FSLeafQueue#getFairShare() returns <memory:0,
vCores:0> 
> 2) When a queue's weight is nonzero, FSLeafQueue#getFairShare() returns <memory:16384,
vCores:8>
> In case 1), that means no container ever gets allocated for an AM because from the viewpoint
of the RM, there is never any headroom to allocate a container on that queue.
> For example, we have a pool with the following weights: 
> - root.dev 0.0 
> - root.product 1.0
> The root.dev is a best effort pool and should only get resources if root.product is not
running. In our tests, with no jobs running under root.product, jobs started in root.dev queue
stay stuck in ACCEPT phase and never start.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message