hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected
Date Mon, 21 Jul 2014 03:15:40 GMT

    [ https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14068147#comment-14068147
] 

Wangda Tan commented on YARN-1198:
----------------------------------

I've just taken a look at all sub tasks of this JIRA, I'm wondering if we should define what
is the "headroom" first.
In previous YARN, including YARN-1198 the headroom is defined as "the maximum resource of
an application can get".
And in YARN-2008, the headroom is defined as "the available resource of an application can
get", because we already considered used resource of sibling queues.

I'm afraid if we need add a new field like "guaranteed headroom" of an application consider
its absolute capacity (not maximum capacity) and user-limits, etc. We may keep both of them
because,
- The maximum resource is not always achievible because sum of maximum resource of leaf queues
may excess cluster resource.
- With preemption, resource beyond guaranteed resource will be likely preempted. It should
be consider as a temporary resource.

And with this, AM can,
- Using "guaranteed headroom" to allocate resource which will not be preempted.
- Using "maximum headroom" to try to allocate resource beyond its guaranteed headroom.

And in my humble opinion, the "available resource of an application can get" doesn't make
a lot of sense here, and may cause some backward-compatible problems as well. Because in a
dynamic cluster, the number can change rapidly, it is possible that a cluster is fulfilled
by another application just happens one second after the AM got the "available headroom".
And also, this field can not solve the deadlock problem as well, a malicious application can
ask much more resource of this, or a careless developer totally ignore this field. The only
valid solution in my head is putting such logic into scheduler side, and enforce resource
usage by preemption policy.

Any thoughts? [~jlowe], [~cwelch]

Thanks,
Wangda

> Capacity Scheduler headroom calculation does not work as expected
> -----------------------------------------------------------------
>
>                 Key: YARN-1198
>                 URL: https://issues.apache.org/jira/browse/YARN-1198
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Omkar Vinit Joshi
>            Assignee: Omkar Vinit Joshi
>         Attachments: YARN-1198.1.patch
>
>
> Today headroom calculation (for the app) takes place only when
> * New node is added/removed from the cluster
> * New container is getting assigned to the application.
> However there are potentially lot of situations which are not considered for this calculation
> * If a container finishes then headroom for that application will change and should be
notified to the AM accordingly.
> * If a single user has submitted multiple applications (app1 and app2) to the same queue
then
> ** If app1's container finishes then not only app1's but also app2's AM should be notified
about the change in headroom.
> ** Similarly if a container is assigned to any applications app1/app2 then both AM should
be notified about their headroom.
> ** To simplify the whole communication process it is ideal to keep headroom per User
per LeafQueue so that everyone gets the same picture (apps belonging to same user and submitted
in same queue).
> * If a new user submits an application to the queue then all applications submitted by
all users in that queue should be notified of the headroom change.
> * Also today headroom is an absolute number ( I think it should be normalized but then
this is going to be not backward compatible..)
> * Also  when admin user refreshes queue headroom has to be updated.
> These all are the potential bugs in headroom calculations



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message