hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun Suresh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5972) Add Support for Pausing/Freezing of containers
Date Tue, 06 Dec 2016 22:23:58 GMT

    [ https://issues.apache.org/jira/browse/YARN-5972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15726913#comment-15726913

Arun Suresh commented on YARN-5972:

[~subru], thanks for taking a look. w.r.t your ([comment|https://issues.apache.org/jira/browse/YARN-5292?focusedCommentId=15685939])
on YARN-5292, [~hrsharma], please feel free to chime in:
Based on my understanding of MAPREDUCE-4584, I feel that feature is actually quite orthogonal
to this, and I don't think one approach is necessarily better/worse than the other. They can
possibly be mixed and matched based on the use case.

While notifying an AM of containers that are about to be preempted does allow the AM to check-point
work, it does imply, as you pointed out, that AMs be modified to act on this input and make
some decisions based on it.

Container pausing/freezing on the other hand, given OS/VM level support (also exposed via
Docker and LXC) to actually freeze a process (agreed, their definition of freeze might vary),
is actually AM/application independent. This can be useful, for applications and deployments
that do not really want to check-point on its own but at the same time like the idea of container
preemption with work preservations.

Also, the NM container lifecycle and API changes in the Container Executor should not ideally
take into account the execution type of the containers. The trigger can either be from the
ContainerScheduler (in case of YARN-1011 and YARN-2877, when it decides resources are required
for a guaranteed container) or from an AM (the AM wants to play nice and relinquish resources
so that some opportunistic containers running on the node to run)

Even though this is currently targeted for opportunistic containers, I don't really see any
problems exposing this to AMs via the ContainerManagementProtocol (though the devil is in
the details)
bq. We cannot guarantee RESUME unless we block the allocation for the Container which IMHO
defeats the purpose
Not sure I completely agree. If an AM pauses a guaranteed container, yes, the allocation is
blocked, but this is no different from an AM starting a container running never-ending sleep
job, except this has the advantage that the NM is aware of it and can use the relinquished
resources to start any queued opportunistic containers. Since the container is guaranteed,
resume is ensured, since any opportunistic container that was running due to graciousness
of the AM would immediately be preempted.

> Add Support for Pausing/Freezing of containers
> ----------------------------------------------
>                 Key: YARN-5972
>                 URL: https://issues.apache.org/jira/browse/YARN-5972
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Arun Suresh
>            Assignee: Hitesh Sharma
> YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add capability
to customize how OPPORTUNISTIC containers get preempted.
> In this JIRA we propose introducing a PAUSED container state.
> Instead of preempting a running container, the container can be moved to a PAUSED state,
where it remains until resources get freed up on the node then the preempted container can
resume to the running state.
> Note that process freezing this is already supported by 'cgroups freezer' which is used
internally by the docker pause functionality. Windows also has OS level support of a similar
> One scenario where this capability is useful is work preservation. How preemption is
done, and whether the container supports it, is implementation specific.
> For instance, if the container is a virtual machine, then preempt call would pause the
VM and resume would restore it back to the running state.
> If the container executor / runtime doesn't support preemption, then preempt would default
to killing the container. 

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message