hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun Suresh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1040) De-link container life cycle from an Allocation
Date Fri, 25 Mar 2016 08:56:25 GMT

    [ https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15211613#comment-15211613

Arun Suresh commented on YARN-1040:

Firstly, Thank you [~vinodkv], [~vvasudev] and [~bikassaha] for reviewing the doc and chiming
in with your thoughts.. 


bq. can see the argument of asking users to use new APIs for new features but requiring existing
apps to change their AM/RM implementations….
We might not actually need to do this. If we ensure that the existing external facing methods
on the ContainerManagementProtocol and ApplicationMasterProtocol work as expected, and introduce
the new methods in wrapper protocols. Apps that need new functionality can use the new API
and those that don’t can stick with the old ones (until a major release when we can retire
the old protocols). We have tried something along the same lines in YARN-2885 (not committed
to trunk yet) where we have a DistributedSchedulingProtocol that extends the ApplicationMasterProtocol
and still exposes the old API.

bq. ..just to be able to launch multiple processes does not seem empathetic.
Hmmm.. Given that launching multiple processes, being a new feature, I feel that it should
be fine to mandate the app to use new APIs, no ?


bq. Is AllocationId a new class that we will introduce or a rename of the existing ContainerId
I expect it to be a new class, but my thinking was that it should replace the existing ContainerId
in the RM. To preserve backward compatibility, for apps using the older API, we can somehow
transform the AllocationId into a ContainerId when the RM responds to the app.

bq. will the NM generate container ids for the individual containers?
That was my plan. As mentioned above, for older apps, ContainerId = AllocationId + '\-0' and
for apps requesting multiple containers per allocation,   ContainerId = AllocationId + '\-'
+ index (some  id incrementing from 0)

bq. An AM can receive only a single allocation on a Node, The Scheduler will "bundle" all
Allocations on a Node for an app into a single Large Allocation.
My thinking was that , given this feature will allow an app to start multiple containers using
a single allocation, an app can now reuse the same allocation to start a new container, rather
than obtain a new allocation. This will minimize the number of Allocations the RM would need
to give out.
Thinking further, I understand how this might break backward compatibility (for apps using
the older API and expecting multiple ContainerTokens on the same node), so I guess, we can
remove this restriction and make sure the "bundling" happens only for app using the new API.

bq. Are you referring to the current ContainerId class? If yes, why is it known only to the
This also concerns the points [~vinodkv] brought up about container exit notifications.
*Today* the ContainerId is known to the RM, since:
* The RM generates the ContainerId, so it obviously needs to know about it.
* The primary means of the RM reclaiming resources from a Node, to schedule waiting apps,
is when the it receives a Container Complete / Killed notification from the Node heartbeat,
for which the ContainerId is necessary for matching the container resource.
* This is also the primary means of the AM being notified of a completed / killed container,
viz. via the RM allocateResponse.

In the new scheme of things
* An Allocation technically never "Completes", unless the AM explicitly deactivates it, at
which point the Node can notify the RM of the terminated Allocation.
* For backward compatibility, Single-use allocations will automatically be deactivated and
notified to the RM when the associated container completes.
* An AM on restart / failover will be notified by the RM of existing Allocations and can query
the NM directly for the status of individual containers.
* An NM on restart neednot report the status of every container, just the Allocations that
were active on the NM. The respective AMs can then query the NM and obtain status of the Container.

For the above cases, the RM does not need to know about the ContainerId per se, only the AllocationId.
The only other case I could think of for the RM knowing about the individual container is
for the case of smarter pre-emption, where the RM can pick specific containers within an Allocation
to be killed rather than the Allocation itself (I had mentioned this in the doc too I guess).
But I guess that can be addressed in subsequent iterations.



You brought up some good points, will incorporate them into the doc.

If you guys are fine with it, I plan to open separate JIRAs, under YARN-4726 breaking up this
work. I feel we can have more focused discussion there on specific aspects of the design.

> De-link container life cycle from an Allocation
> -----------------------------------------------
>                 Key: YARN-1040
>                 URL: https://issues.apache.org/jira/browse/YARN-1040
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 3.0.0
>            Reporter: Steve Loughran
>         Attachments: YARN-1040-rough-design.pdf
> The AM should be able to exec >1 process in a container, rather than have the NM automatically
release the container when the single process exits.
> This would let an AM restart a process on the same container repeatedly, which for HBase
would offer locality on a restarted region server.
> We may also want the ability to exec multiple processes in parallel, so that something
could be run in the container while a long-lived process was already running. This can be
useful in monitoring and reconfiguring the long-lived process, as well as shutting it down.

This message was sent by Atlassian JIRA

View raw message