hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Carlo Curino (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
Date Tue, 09 Apr 2013 02:29:14 GMT

    [ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13626139#comment-13626139
] 

Carlo Curino commented on YARN-45:
----------------------------------


High level idea:

The philosophy behind preemption is that we give the AM a heads up about resources that are
likely to be taken away, and give it the opportunity to save the state of its tasks. A separate
kill-based mechanism already exists (leveraged by the FairScheduler for its preemption) to
forcibly recover containers.
This is our fallback if the AM does not release containers in a certain amount of time (note
that due to the quickly evolving conditions of the cluster we might not kill a container if
at a later time we realize this is not strictly needed to achieve fairness/capacity). This
means an AM can be written completely ignoring preemption hints, and would work correctly
(although it might waste useful work).

The goal is to allow for smart local policies in the AM, which leverage application-level
understanding of the ongoing computation to face the imminent reduction of resources (e.g.,
by saving the state of the computation to a checkpoint, by promoting partial output, by migrating
competencies to other tasks, by try to complete the work quickly). The goal is to spare the
RM from understanding application-level optimization concerns but rather focus on resource
management issues. As a consequence we envision (among others) preemption requests that are
not fully bounded, allowing the AM to leverage some flexibility. Note that the significant
"lag"  imposed by the heartbeat protocols between RM-AM and AM-Tasks and NM-RM force us to
consider in most cases preemption actions to be limited to a rather long time horizon. We
can't expect to operate in a tight sub-second control loop, but rather trigger changes in
the cluster allocation in the orders of tens of seconds. As a consequence preemption should
be used to correct macroscopic issues that are likely to be somewhat stable over time, rather
than micro-managing container allocations.

We consider the following use cases for preemption: 
# Scheduling policies aimed at rebalancing some global property such as capacity or fairness.
This allows to go for example over capacity on a queue and get resources back as the cluster
conditions change. 
# Scheduling policies that are making point decisions about individual containers (e.g., preempt
a container on a machine and restart it elsewhere to improve data locality, or preempting
containers on a box that is observing excessive IOs).
# Administrative actions that are aimed at modifying the cluster allocations without wasting
work (e.g., draining a machine or a rack before taking it offline for maintenance), manually
reducing allocations for a job, etc.

Use cases 1 and 3 can be implemented by picking containers at the RM, or by expressing a "broad"
request of a certain amount of resources (we reuse the ResourceRequest for this, in a way
that is symmetric to the AM request) and let the AM to bound this to specific containers.
While use case 2 is more likely to be implemented using ContainerIDs.



Protocol change proposal:

Our proposal consists in extending the ResourceResponse with a PreemptRequest message (further
extensible in the future) that contains a Set<ContainerId> and a Set<ResourceRequest>.
 The current semantics is that these two sets are non-overlapping (i.e., if I ask for a specific
container and a ResourceRequest the AM is supposed to satisfy both). Once again, as we never
rely on the AM to "enforce" preemption but we have a kill-based fallback, the AM implementation
is not required to understand the preemption requests (nor even acknowledging their receiving).
This make for an simple upgrade story and one could run mixed preemption-aware and not-preemption-aware
AMs on the same cluster. 

A current open question we would like input on is whether to have the PreemptRequest to be
a union-type where we have either sets (but not both together), or whether to allow, as we
do in the attached patch, for both to co-exists in the same PreemptRequest. We do not have
a current need for the "both" use case, but maybe others do. thoughts?



Coming up next:

We are cleaning up further patches to the FairScheduler, CapacityScheduler and ApplicationMasterService
leveraging this AM-RM protocol, and changes to the mapreduce AM that implements work-saving
preemption via checkpointing for Shuffle and Reducers (while for Mappers we are currently
"making a run for it" given the commonly short runtime of maps). The other patches will be
posted soon. 

                
> Scheduler feedback to AM to release containers
> ----------------------------------------------
>
>                 Key: YARN-45
>                 URL: https://issues.apache.org/jira/browse/YARN-45
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>            Reporter: Chris Douglas
>         Attachments: YARN-45.patch
>
>
> The ResourceManager strikes a balance between cluster utilization and strict enforcement
of resource invariants in the cluster. Individual allocations of containers must be reclaimed-
or reserved- to restore the global invariants when cluster load shifts. In some cases, the
ApplicationMaster can respond to fluctuations in resource availability without losing the
work already completed by that task (MAPREDUCE-4584). Supplying it with this information would
be helpful for overall cluster utilization [1]. To this end, we want to establish a protocol
for the RM to ask the AM to release containers.
> [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message