Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: yarn-issues@hadoop.apache.org
Date: Fri, 25 Mar 2016 08:56:25 +0000 (UTC)
From: "Arun Suresh (JIRA)" <jira@apache.org>
To: yarn-issues@hadoop.apache.org
Message-ID: <JIRA.12662436.1375896716000.49052.1458896185897@Atlassian.JIRA>
In-Reply-To: <JIRA.12662436.1375896716000@Atlassian.JIRA>
References: <JIRA.12662436.1375896716000@Atlassian.JIRA>
 <JIRA.12662436.1375896716831@arcas>
Subject: [jira] [Commented] (YARN-1040) De-link container life cycle from an
 Allocation
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/YARN-1040?page=3Dcom.atlassian.=
jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D15211=
613#comment-15211613 ]=20

Arun Suresh commented on YARN-1040:
-----------------------------------

Firstly, Thank you [~vinodkv], [~vvasudev] and [~bikassaha] for reviewing t=
he doc and chiming in with your thoughts..=20

[~bikassaha]

bq. can see the argument of asking users to use new APIs for new features b=
ut requiring existing apps to change their AM/RM implementations=E2=80=A6.
We might not actually need to do this. If we ensure that the existing exter=
nal facing methods on the ContainerManagementProtocol and ApplicationMaster=
Protocol work as expected, and introduce the new methods in wrapper protoco=
ls. Apps that need new functionality can use the new API and those that don=
=E2=80=99t can stick with the old ones (until a major release when we can r=
etire the old protocols). We have tried something along the same lines in Y=
ARN-2885 (not committed to trunk yet) where we have a DistributedScheduling=
Protocol that extends the ApplicationMasterProtocol and still exposes the o=
ld API.

bq. ..just to be able to launch multiple processes does not seem empathetic=
.
Hmmm.. Given that launching multiple processes, being a new feature, I feel=
 that it should be fine to mandate the app to use new APIs, no ?
=09
----

[~vvasudev]

bq. Is AllocationId a new class that we will introduce or a rename of the e=
xisting ContainerId class?
I expect it to be a new class, but my thinking was that it should replace t=
he existing ContainerId in the RM. To preserve backward compatibility, for =
apps using the older API, we can somehow transform the AllocationId into a =
ContainerId when the RM responds to the app.

bq. will the NM generate container ids for the individual containers?
That was my plan. As mentioned above, for older apps, ContainerId =3D Alloc=
ationId + '\-0' and for apps requesting multiple containers per allocation,=
   ContainerId =3D AllocationId + '\-' + index (some  id incrementing from =
0)

bq. An AM can receive only a single allocation on a Node, The Scheduler wil=
l "bundle" all Allocations on a Node for an app into a single Large Allocat=
ion.
My thinking was that , given this feature will allow an app to start multip=
le containers using a single allocation, an app can now reuse the same allo=
cation to start a new container, rather than obtain a new allocation. This =
will minimize the number of Allocations the RM would need to give out.
Thinking further, I understand how this might break backward compatibility =
(for apps using the older API and expecting multiple ContainerTokens on the=
 same node), so I guess, we can remove this restriction and make sure the "=
bundling" happens only for app using the new API.

bq. Are you referring to the current ContainerId class? If yes, why is it k=
nown only to the AM?
This also concerns the points [~vinodkv] brought up about container exit no=
tifications.
*Today* the ContainerId is known to the RM, since:
* The RM generates the ContainerId, so it obviously needs to know about it.
* The primary means of the RM reclaiming resources from a Node, to schedule=
 waiting apps, is when the it receives a Container Complete / Killed notifi=
cation from the Node heartbeat, for which the ContainerId is necessary for =
matching the container resource.
* This is also the primary means of the AM being notified of a completed / =
killed container, viz. via the RM allocateResponse.

In the new scheme of things
* An Allocation technically never "Completes", unless the AM explicitly dea=
ctivates it, at which point the Node can notify the RM of the terminated Al=
location.
* For backward compatibility, Single-use allocations will automatically be =
deactivated and notified to the RM when the associated container completes.
* An AM on restart / failover will be notified by the RM of existing Alloca=
tions and can query the NM directly for the status of individual containers=
.
* An NM on restart neednot report the status of every container, just the A=
llocations that were active on the NM. The respective AMs can then query th=
e NM and obtain status of the Container.

For the above cases, the RM does not need to know about the ContainerId per=
 se, only the AllocationId. The only other case I could think of for the RM=
 knowing about the individual container is for the case of smarter pre-empt=
ion, where the RM can pick specific containers within an Allocation to be k=
illed rather than the Allocation itself (I had mentioned this in the doc to=
o I guess). But I guess that can be addressed in subsequent iterations.

----

[~vinodkv]

You brought up some good points, will incorporate them into the doc.

If you guys are fine with it, I plan to open separate JIRAs, under YARN-472=
6 breaking up this work. I feel we can have more focused discussion there o=
n specific aspects of the design.

> De-link container life cycle from an Allocation
> -----------------------------------------------
>
>                 Key: YARN-1040
>                 URL: https://issues.apache.org/jira/browse/YARN-1040
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 3.0.0
>            Reporter: Steve Loughran
>         Attachments: YARN-1040-rough-design.pdf
>
>
> The AM should be able to exec >1 process in a container, rather than have=
 the NM automatically release the container when the single process exits.
> This would let an AM restart a process on the same container repeatedly, =
which for HBase would offer locality on a restarted region server.
> We may also want the ability to exec multiple processes in parallel, so t=
hat something could be run in the container while a long-lived process was =
already running. This can be useful in monitoring and reconfiguring the lon=
g-lived process, as well as shutting it down.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)