hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun Suresh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5292) Support for PAUSED container state
Date Sun, 20 Nov 2016 14:18:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15681223#comment-15681223

Arun Suresh commented on YARN-5292:

Thanks for the patch [~hrsharma]..

Did a fly-by of the patch and the design doc. Some design comments:
# The original intent of the JIRA, I guess is to provide an alternative to killing opportunistic
containers to make room for guaranteed containers. This implies that we would need to wire
this through the ContainerScheduler, which is now the entity that decides when to and which
opp containers to kill.
# I was thinking we could also expose an API on the ContainerManagementProtocol, to allow
AMs to directly pause a container, but I am guessing this should be allowed only for Guaranteed
containers. Since if we expose a pause API, we should expose a resume API, but it is not necessary
that opportunistic containers are resume-able at the time the AM needs them to be. [~jianhe],
[~vvasudev], would be nice to hear your thoughts on this. Since if I understand correctly,
for yarn native services, there is a need to just stop a container (without losing the allocation)
for a period of time. Don't know if that can be modeled as a container PAUSE via some support
from the underlying ContainerExecutor/Runtime.
# We need some way to expose what resource are reclaimable by the NM when a container is paused.
It is possible that on deployments using some implementations of the ContainerExecutor/Runtime
that not all resources of a paused container will be reclaim-able by the NM to start other
opportunistic/guaranteed containers. For eg, it maybe that on some systems, vcores are throttled
to 0 for the container, while on others, the memory / state is also dumped into a secondary
store, which means the memory also might be re-claimable. We would some way to plug this information
into the ResourUtilizationTracker and the ContainerScheduler.

I am thinking we should maybe convert this to an Umbrella JIRA and have work items as sub-jiras
created against it and work against a branch. 

With regard to the patch itself, I understand the current one is meant to handle the changes
needed in the state machines etc. Do take a look at {{TestContainer}} class, and see if it
is possible to add some tests to verify that container life-cycle events are handled correctly.
Will take a deeper look at the patch after that.

> Support for PAUSED container state
> ----------------------------------
>                 Key: YARN-5292
>                 URL: https://issues.apache.org/jira/browse/YARN-5292
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Hitesh Sharma
>            Assignee: Hitesh Sharma
>         Attachments: YARN-5292.001.patch, YARN-5292.002.patch, YARN-5292.003.patch, yarn-5292.pdf
> YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add capability
to customize how OPPORTUNISTIC containers get preempted.
> In this JIRA we propose introducing a PAUSED container state.
> When a running container gets preempted, it enters the PAUSED state, where it remains
until resources get freed up on the node then the preempted container can resume to the running
> One scenario where this capability is useful is work preservation. How preemption is
done, and whether the container supports it, is implementation specific.
> For instance, if the container is a virtual machine, then preempt would pause the VM
and resume would restore it back to the running state.
> If the container doesn't support preemption, then preempt would default to killing the

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message