hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected
Date Mon, 21 Jul 2014 14:06:39 GMT

    [ https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14068545#comment-14068545

Jason Lowe commented on YARN-1198:

We need to worry about headroom as a report of resources that can be allocated if requested.
 If the AM cannot currently allocate any containers then the headroom should be reported as
zero.  I think "guaranteed headroom" is a separate JIRA and not necessary to solve the deadlock
issues surrounding the current headroom reporting.

bq. Because in a dynamic cluster, the number can change rapidly, it is possible that a cluster
is fulfilled by another application just happens one second after the AM got the "available

Sure, this can happen.  However on the next heartbeat the headroom will be reported as less
than it was before, and the AM can take appropriate action.  I don't see this as a major issue
at least in the short-term.  Telling an AM repeatedly that it can allocate resources that
will never be allocated in practice is definitely wrong and needs to be fixed.

bq. And also, this field can not solve the deadlock problem as well, a malicious application
can ask much more resource of this, or a careless developer totally ignore this field.

A malicious application cannot cause another application to deadlock as long as the YARN scheduler
properly enforces user limits and properly reports the headroom to applications.  It seems
to me the worst case is an application hurts itself, but since the entire application can
be custom user code there's not much YARN can do to prevent that.

bq. The only valid solution in my head is putting such logic into scheduler side, and enforce
resource usage by preemption policy.

The problem is that the scheduler does not, and IMHO should not, know the details of the particular
application.  For example, let's say an application's headroom goes to zero but is has outstanding
allocation requests.  Should the YARN scheduler automatically preempt something when this
occurs?  If so which container does it preempt?  These are questions an AM can answer optimally,
including an answer of preempting nothing (e.g.: task is completing imminently), while I don't
see how the YARN scheduler can make good decisions without either putting application-specific
logic in the YARN scheduler or having the YARN scheduler defer to the AM to make the decision.
 Reporting the headroom to the AM enables the AM to make an application-optimal decision of
what to do, if anything, when the available resources to the application changes.

> Capacity Scheduler headroom calculation does not work as expected
> -----------------------------------------------------------------
>                 Key: YARN-1198
>                 URL: https://issues.apache.org/jira/browse/YARN-1198
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Omkar Vinit Joshi
>            Assignee: Omkar Vinit Joshi
>         Attachments: YARN-1198.1.patch
> Today headroom calculation (for the app) takes place only when
> * New node is added/removed from the cluster
> * New container is getting assigned to the application.
> However there are potentially lot of situations which are not considered for this calculation
> * If a container finishes then headroom for that application will change and should be
notified to the AM accordingly.
> * If a single user has submitted multiple applications (app1 and app2) to the same queue
> ** If app1's container finishes then not only app1's but also app2's AM should be notified
about the change in headroom.
> ** Similarly if a container is assigned to any applications app1/app2 then both AM should
be notified about their headroom.
> ** To simplify the whole communication process it is ideal to keep headroom per User
per LeafQueue so that everyone gets the same picture (apps belonging to same user and submitted
in same queue).
> * If a new user submits an application to the queue then all applications submitted by
all users in that queue should be notified of the headroom change.
> * Also today headroom is an absolute number ( I think it should be normalized but then
this is going to be not backward compatible..)
> * Also  when admin user refreshes queue headroom has to be updated.
> These all are the potential bugs in headroom calculations

This message was sent by Atlassian JIRA

View raw message