hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected
Date Wed, 16 Jul 2014 14:39:06 GMT

    [ https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14063548#comment-14063548

Jason Lowe commented on YARN-1198:

bq. I think it would be worthwhile to do a min of the calculated headroom against "cluster
headroom" as a sanity check, cluster headroom being the total cluster resource - utilized

There are a couple of min's that need to be added to the current headroom calculation to catch
some unsolved deadlock scenarios (again ignoring blacklisting effects):

- Need to min against the available resources in the current queue, otherwise we don't account
for the resources consumed by other users in the queue.
- Need to min against the available resources in the parent queues (all the way up to the
root queue), otherwise we don't account for the resources consumed elsewhere in the cluster.

The first min above solves the deadlock where two apps from different users but in the same
queue completely exhaust the queue's limits but there is still available resources in the
cluster for other queues.  I believe the patch for YARN-1857 is intended to correct that scenario.

For the second min, a min against the cluster avail resources is equivalent if there aren't
any hierarchical queues, but it fails to prevent some deadlocks if those are employed.  We
could have a parent queue that's not allowed to use all of the cluster resources and two leaf
queues underneath that queue whose max cap each can be the entire parent queue.  Apps competing
between those two leaf queues could completely saturate the parent queue and deadlock but
there are still resources available at the cluster level.  I think YARN-2008 is related there.

In summary, I think we need to add in a min against the queue avail resources for all queues
from the current leaf queue up to and including the root queue (i.e.: the whole cluster).

> Capacity Scheduler headroom calculation does not work as expected
> -----------------------------------------------------------------
>                 Key: YARN-1198
>                 URL: https://issues.apache.org/jira/browse/YARN-1198
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Omkar Vinit Joshi
>            Assignee: Omkar Vinit Joshi
>         Attachments: YARN-1198.1.patch
> Today headroom calculation (for the app) takes place only when
> * New node is added/removed from the cluster
> * New container is getting assigned to the application.
> However there are potentially lot of situations which are not considered for this calculation
> * If a container finishes then headroom for that application will change and should be
notified to the AM accordingly.
> * If a single user has submitted multiple applications (app1 and app2) to the same queue
> ** If app1's container finishes then not only app1's but also app2's AM should be notified
about the change in headroom.
> ** Similarly if a container is assigned to any applications app1/app2 then both AM should
be notified about their headroom.
> ** To simplify the whole communication process it is ideal to keep headroom per User
per LeafQueue so that everyone gets the same picture (apps belonging to same user and submitted
in same queue).
> * If a new user submits an application to the queue then all applications submitted by
all users in that queue should be notified of the headroom change.
> * Also today headroom is an absolute number ( I think it should be normalized but then
this is going to be not backward compatible..)
> * Also  when admin user refreshes queue headroom has to be updated.
> These all are the potential bugs in headroom calculations

This message was sent by Atlassian JIRA

View raw message