hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Vasudev (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1040) De-link container life cycle from an Allocation
Date Wed, 23 Mar 2016 15:28:25 GMT

    [ https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15208594#comment-15208594
] 

Varun Vasudev commented on YARN-1040:
-------------------------------------

Thanks for putting up the proposal [~asuresh]! 

bq. "ContainerId" becomes "AllocationId"
Is AllocationId a new class that we will introduce or a rename of the existing ContainerId
class? In either case we have some issues to sort out - the first one won't be backward compatible
and in the second case, will the NM generate container ids for the individual containers?

bq. An AM can receive only a single allocation on a Node, The Scheduler will "bundle" all
Allocations on a Node for an app into a single Large Allocation.
Can you explain why we need this restriction?

bq. Each Container is tagged with a "ContainerId" which is known only to the AM.
Are you referring to the current ContainerId class? If yes, why is it known only to the AM?

I actually agree with both Vinod and Bikas. The current approach is a little disruptive and
not very useful for existing apps. I think we should separate out allocations work into their
own classes on the RM and the NM with new APIs added for the RM and the NM. I don't think
we can get away with modifying the existing APIs, the one exception being the allocate call,
where we can add an additional flag to indicate whether an allocation or a container is desired.
Internally, we can change the implementation to have the container model use allocations but
I think allocations will have to have their own state machine withe slightly different semantics
than containers(on both the RM and NM). 

> De-link container life cycle from an Allocation
> -----------------------------------------------
>
>                 Key: YARN-1040
>                 URL: https://issues.apache.org/jira/browse/YARN-1040
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 3.0.0
>            Reporter: Steve Loughran
>         Attachments: YARN-1040-rough-design.pdf
>
>
> The AM should be able to exec >1 process in a container, rather than have the NM automatically
release the container when the single process exits.
> This would let an AM restart a process on the same container repeatedly, which for HBase
would offer locality on a restarted region server.
> We may also want the ability to exec multiple processes in parallel, so that something
could be run in the container while a long-lived process was already running. This can be
useful in monitoring and reconfiguring the long-lived process, as well as shutting it down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message