Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A051519DDA for ; Fri, 25 Mar 2016 08:56:31 +0000 (UTC) Received: (qmail 49887 invoked by uid 500); 25 Mar 2016 08:56:26 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 49832 invoked by uid 500); 25 Mar 2016 08:56:26 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 49809 invoked by uid 99); 25 Mar 2016 08:56:26 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Mar 2016 08:56:26 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id DBE122C1F75 for ; Fri, 25 Mar 2016 08:56:25 +0000 (UTC) Date: Fri, 25 Mar 2016 08:56:25 +0000 (UTC) From: "Arun Suresh (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-1040) De-link container life cycle from an Allocation MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-1040?page=3Dcom.atlassian.= jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D15211= 613#comment-15211613 ]=20 Arun Suresh commented on YARN-1040: ----------------------------------- Firstly, Thank you [~vinodkv], [~vvasudev] and [~bikassaha] for reviewing t= he doc and chiming in with your thoughts..=20 [~bikassaha] bq. can see the argument of asking users to use new APIs for new features b= ut requiring existing apps to change their AM/RM implementations=E2=80=A6. We might not actually need to do this. If we ensure that the existing exter= nal facing methods on the ContainerManagementProtocol and ApplicationMaster= Protocol work as expected, and introduce the new methods in wrapper protoco= ls. Apps that need new functionality can use the new API and those that don= =E2=80=99t can stick with the old ones (until a major release when we can r= etire the old protocols). We have tried something along the same lines in Y= ARN-2885 (not committed to trunk yet) where we have a DistributedScheduling= Protocol that extends the ApplicationMasterProtocol and still exposes the o= ld API. bq. ..just to be able to launch multiple processes does not seem empathetic= . Hmmm.. Given that launching multiple processes, being a new feature, I feel= that it should be fine to mandate the app to use new APIs, no ? =09 ---- [~vvasudev] bq. Is AllocationId a new class that we will introduce or a rename of the e= xisting ContainerId class? I expect it to be a new class, but my thinking was that it should replace t= he existing ContainerId in the RM. To preserve backward compatibility, for = apps using the older API, we can somehow transform the AllocationId into a = ContainerId when the RM responds to the app. bq. will the NM generate container ids for the individual containers? That was my plan. As mentioned above, for older apps, ContainerId =3D Alloc= ationId + '\-0' and for apps requesting multiple containers per allocation,= ContainerId =3D AllocationId + '\-' + index (some id incrementing from = 0) bq. An AM can receive only a single allocation on a Node, The Scheduler wil= l "bundle" all Allocations on a Node for an app into a single Large Allocat= ion. My thinking was that , given this feature will allow an app to start multip= le containers using a single allocation, an app can now reuse the same allo= cation to start a new container, rather than obtain a new allocation. This = will minimize the number of Allocations the RM would need to give out. Thinking further, I understand how this might break backward compatibility = (for apps using the older API and expecting multiple ContainerTokens on the= same node), so I guess, we can remove this restriction and make sure the "= bundling" happens only for app using the new API. bq. Are you referring to the current ContainerId class? If yes, why is it k= nown only to the AM? This also concerns the points [~vinodkv] brought up about container exit no= tifications. *Today* the ContainerId is known to the RM, since: * The RM generates the ContainerId, so it obviously needs to know about it. * The primary means of the RM reclaiming resources from a Node, to schedule= waiting apps, is when the it receives a Container Complete / Killed notifi= cation from the Node heartbeat, for which the ContainerId is necessary for = matching the container resource. * This is also the primary means of the AM being notified of a completed / = killed container, viz. via the RM allocateResponse. In the new scheme of things * An Allocation technically never "Completes", unless the AM explicitly dea= ctivates it, at which point the Node can notify the RM of the terminated Al= location. * For backward compatibility, Single-use allocations will automatically be = deactivated and notified to the RM when the associated container completes. * An AM on restart / failover will be notified by the RM of existing Alloca= tions and can query the NM directly for the status of individual containers= . * An NM on restart neednot report the status of every container, just the A= llocations that were active on the NM. The respective AMs can then query th= e NM and obtain status of the Container. For the above cases, the RM does not need to know about the ContainerId per= se, only the AllocationId. The only other case I could think of for the RM= knowing about the individual container is for the case of smarter pre-empt= ion, where the RM can pick specific containers within an Allocation to be k= illed rather than the Allocation itself (I had mentioned this in the doc to= o I guess). But I guess that can be addressed in subsequent iterations. ---- [~vinodkv] You brought up some good points, will incorporate them into the doc. If you guys are fine with it, I plan to open separate JIRAs, under YARN-472= 6 breaking up this work. I feel we can have more focused discussion there o= n specific aspects of the design. > De-link container life cycle from an Allocation > ----------------------------------------------- > > Key: YARN-1040 > URL: https://issues.apache.org/jira/browse/YARN-1040 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager > Affects Versions: 3.0.0 > Reporter: Steve Loughran > Attachments: YARN-1040-rough-design.pdf > > > The AM should be able to exec >1 process in a container, rather than have= the NM automatically release the container when the single process exits. > This would let an AM restart a process on the same container repeatedly, = which for HBase would offer locality on a restarted region server. > We may also want the ability to exec multiple processes in parallel, so t= hat something could be run in the container while a long-lived process was = already running. This can be useful in monitoring and reconfiguring the lon= g-lived process, as well as shutting it down. -- This message was sent by Atlassian JIRA (v6.3.4#6332)