hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
Date Sat, 13 Apr 2013 02:44:21 GMT

    [ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13630893#comment-13630893

Chris Douglas commented on YARN-45:

[~sandyr]: Yes, but the correct format/semantics for time are a complex discussion in themselves.
To keep this easy to review and the discussion focused, we were going to file that separately.
But I totally agree: for the AM to respond intelligently, the time before it's forced to give
up the container is valuable input.

[~bikash]: Agree almost completely. In YARN-569, the hysteresis you cite motivated several
design points, including multiple dampers on actions taken by the preemption policy, out-of-band
observation/enforcement, and no effort to fine-tune particular allocations. The role of preemption
(to summarize what [~curino] discussed in detail in the prenominate JIRA) is to make coarse
corrections around the core scheduler invariants (e.g., capacity, fairness). Rather than introducing
new races or complexity, one could argue that preemption is a dual of allocation in an inconsistent

Your proposal matches case (1) in the above [comment|https://issues.apache.org/jira/browse/YARN-45?focusedCommentId=13628950&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13628950],
where the RM specifies the set of containers in jeopardy and a contract (as {{ResourceRequest}})
for avoiding the kills, should the AM have cause to pick different containers. Further, your
observation that the RM has enough information in priorities, etc. to make an educated guess
at those containers is spot-on. IIRC, the policy uses allocation order when selecting containers,
but that should be a secondary key after priority.

The disputed point, and I'm not sure we actually disagree, is the claim that the AM should
never kill things in response to this message. To be fair, that can be implemented by just
ignoring the requests, so it's orthogonal to this particular protocol, but it's certainly
an important "best practice" to discuss to ensure we're capturing the right thing. Certainly
there are many cases where ignoring the message is correct; most CDFs of map task execution
time show that over 80% finish in less than a minute, so the AM has few reasons to pessimistically
kill them.

There are a few scenarios where this isn't optimal. Take the case of YARN-415, where the AM
is billed cumulatively for cluster time. Assume an AM knows (a) the container will not finish
(reinforcing [~sandyr]'s point about including time in the preemption message) and (b) the
work done is not worth checkpointing. It can conclude that killing the container is in its
best interest, because squatting on the resource could affect its ability to get containers
in the future (or simply cost more). Moreover, for long-lived services and speculative container
allocation/retention, the AM may actually be holding the container only as an optimization
or for a future execution, so it could release it at low cost to itself. Finally, the time
allowed before the RM starts killing containers can be extended if AMs typically return resources
before the deadline.

It's also a mechanism for the RM to advise the AM about constraints that prevent it from granting
its pending requests. The AM currently kills reducers if it can't get containers to regenerate
lost map output. If the scheduler values some containers more than others, the AM's response
to starvation can be improved from random killing. This is a case where the current implementation
acknowledges the fact that it already runs in an inconsistent environment.
> Scheduler feedback to AM to release containers
> ----------------------------------------------
>                 Key: YARN-45
>                 URL: https://issues.apache.org/jira/browse/YARN-45
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Chris Douglas
>            Assignee: Carlo Curino
>         Attachments: YARN-45.patch, YARN-45.patch
> The ResourceManager strikes a balance between cluster utilization and strict enforcement
of resource invariants in the cluster. Individual allocations of containers must be reclaimed-
or reserved- to restore the global invariants when cluster load shifts. In some cases, the
ApplicationMaster can respond to fluctuations in resource availability without losing the
work already completed by that task (MAPREDUCE-4584). Supplying it with this information would
be helpful for overall cluster utilization [1]. To this end, we want to establish a protocol
for the RM to ask the AM to release containers.
> [1] http://research.yahoo.com/files/yl-2012-003.pdf

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message