hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2592) Preemption can kill containers to fulfil need of already over-capacity queue.
Date Wed, 24 Sep 2014 19:23:34 GMT

    [ https://issues.apache.org/jira/browse/YARN-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146757#comment-14146757

Jason Lowe commented on YARN-2592:

IMHO users shouldn't be complaining if they are getting their guarantees (i.e.: the capacity
of the queue).  Anything over capacity is "bonus" and they shouldn't rely on the scheduler
going out of its way to give it more.  If they can't get their stuff done within their configured
capacity then they need more capacity.

bq. I think promoting proper handling of preemption on the app side (i.e., checkpoint your
state, or redistributed your computation) is overall a healthier direction. 

I agree with the theory.  If preempting is "cheap" then we should leverage it more often to
solve resource contention.  The problem in practice is that it's often outside the hands of
ops and even the users.  YARN is becoming more and more general, including app frameworks
that aren't part of the core Hadoop stack, and I think it will be commonplace for quite some
time that at least some apps won't have checkpoint/migration support.  That makes preemption
not-so-cheap, which means we don't want to use it unless really necessary.  Killing containers
to give another queue more "bonus" resources is unnecessary and therefore preferable to avoid
when preemption isn't cheap.  If those resources really are necessary then the queue should
have more guaranteed capacity rather than expect the scheduler to kill other containers when
it's beyond capacity.

> Preemption can kill containers to fulfil need of already over-capacity queue.
> -----------------------------------------------------------------------------
>                 Key: YARN-2592
>                 URL: https://issues.apache.org/jira/browse/YARN-2592
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 3.0.0, 2.5.1
>            Reporter: Eric Payne
> There are scenarios in which one over-capacity queue can cause preemption of another
over-capacity queue. However, since killing containers may lose work, it doesn't make sense
to me to kill containers to feed an already over-capacity queue.
> Consider the following:
> {code}
> root has A,B,C, total capacity = 90
> A.guaranteed = 30, A.pending = 5, A.current = 40
> B.guaranteed = 30, B.pending = 0, B.current = 50
> C.guaranteed = 30, C.pending = 0, C.current = 0
> {code}
> In this case, the queue preemption monitor will kill 5 resources from queue B so that
queue A can pick them up, even though queue A is already over its capacity. This could lose
any work that those containers in B had already done.
> Is there a use case for this behavior? It seems to me that if a queue is already over
its capacity, it shouldn't destroy the work of other queues. If the over-capacity queue needs
more resources, that seems to be a problem that should be solved by increasing its guarantee.

This message was sent by Atlassian JIRA

View raw message