hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roni Burd (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5292) Support for PAUSED container state
Date Sat, 25 Jun 2016 00:49:16 GMT

    [ https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348963#comment-15348963

Roni Burd commented on YARN-5292:

Chiming in on the use case

An AM may use opportunistic containers (https://issues.apache.org/jira/browse/YARN-2882) for
a couple of things:
1) Duplicate execution: race a slow/laggard container and see which one is making more progress.
2) Speculative execution: I can start future work by scavenging resources that are free and
above my current allocation
3) Have a customer "pay" for opportunistic containers which are " cheaper" than guaranteed
containers, fully knowing that the job has a lower SLA

In all these cases, the Opportunistic token may get preempted. The question is what strategy
to choose on preemption:
1: Kill the container
2: context switch the container somehow
3: move the container somewhere else

Case #2 and #3 are work preserving strategies. This is important in long running batch jobs.
Imagine a stage in the job was RUNNING for 10 min on an opportunistic container that has 1
minute left to run. I already localized resources and processed a bunch of data and then a
small 30s GUARANTEED container preempted. KILL becomes very expensive. So more than a PAUSED
state, I think it is a PREEMPTED state. 

> Support for PAUSED container state
> ----------------------------------
>                 Key: YARN-5292
>                 URL: https://issues.apache.org/jira/browse/YARN-5292
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Hitesh Sharma
> YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add capability
to customize how OPPORTUNISTIC containers get preempted.
> In this JIRA we propose introducing a PAUSED container state.
> When a running container gets preempted, it enters the PAUSED state, where it remains
until resources get freed up on the node then the preempted container can resume to the running
> One scenario where this capability is useful is work preservation. How preemption is
done, and whether the container supports it, is implementation specific.
> For instance, if the container is a virtual machine, then preempt would pause the VM
and resume would restore it back to the running state.
> If the container doesn't support preemption, then preempt would default to killing the

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message