Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 1737B200B95 for ; Mon, 12 Sep 2016 09:36:23 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 15DF7160AB8; Mon, 12 Sep 2016 07:36:23 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 5EEC9160AD6 for ; Mon, 12 Sep 2016 09:36:22 +0200 (CEST) Received: (qmail 12174 invoked by uid 500); 12 Sep 2016 07:36:21 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 11836 invoked by uid 99); 12 Sep 2016 07:36:21 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Sep 2016 07:36:21 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id BCEB52C1B84 for ; Mon, 12 Sep 2016 07:36:20 +0000 (UTC) Date: Mon, 12 Sep 2016 07:36:20 +0000 (UTC) From: "Arun Suresh (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (YARN-5620) Core changes in NodeManager to support for upgrade and rollback of Containers MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 12 Sep 2016 07:36:23 -0000 [ https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15483362#comment-15483362 ] Arun Suresh edited comment on YARN-5620 at 9/12/16 7:35 AM: ------------------------------------------------------------ Updating patch. * Addressing [~jianhe]'s latest comments * some javadoc, checkstyle and javac fixes bq. IIUC, in this case, the ContainerImpl will receive the KILL event first and move to the KILLING state, and the CONTAINER_KILLED_ON_REQUEST will be sent to the container at KILLING state.. It goes to KILLING stage only if the AM explicitly sends a kill signal or the RM asks NM to kill. It is also possible that the an admin logs into the NM and does a 'kill -9' which will also cause the ContainerLaunch to send CONTAINER_KILLED_ON_REQUEST but it wont be in KILLING state.. right ? bq. ..In testContainerUpgradeSuccess, could you make newStartFile a new upgrade resource, and verify the output is written into it, this verifies the part about the localization part as well. Actually if you look at the _prepareContainerUpgrade()_ function, we create a new script file *scriptFile_new* which is passed into the _prepareContainerLaunchContext()_ function which associates the new file to a new *dest_file_new* location.. this should verify that the upgrade needed a new localized resource. The output of the script is also written to a new *start_file_n.txt* which we read and verify to check if the new process has actually started. Also by the way: bq. We can use the ResourceSet#getAllResourcesByVisibility method instead, and so the getLocalPendingRequests method and the new constructor in ContainerLocalizationRequestEvent is not needed The problem with getAllResourcesByVisibility, is it gets all resources. I just need the pending resources... So if you are ok with it, Id like to keep it as is.. was (Author: asuresh): Updating patch. * Addressing [~jianhe]'s latest comments * some javadoc, checkstyle and javac fixes bq. IIUC, in this case, the ContainerImpl will receive the KILL event first and move to the KILLING state, and the CONTAINER_KILLED_ON_REQUEST will be sent to the container at KILLING state.. It goes to KILLING stage only if the AM explicitly sends a kill signal or the RM asks NM to kill. It is also possible that the an admin logs into the NM and does a 'kill -9' which will also cause the ContainerLaunch to send CONTAINER_KILLED_ON_REQUEST but it wont be in KILLING state.. right ? bq. ..In testContainerUpgradeSuccess, could you make newStartFile a new upgrade resource, and verify the output is written into it, this verifies the part about the localization part as well. Actually if you look at the _prepareContainerUpgrade()_ function, we create a new script file *scriptFile_new* while passed into the _prepareContainerLaunchContext()_ function which associates the new file to a new *dest_file_new* location.. this should verify that the upgrade needed a new localized resource. The output of the script is also written to a new *start_file_n.txt* which we read and verify to check if the new process has actually started. Also by the way: bq. We can use the ResourceSet#getAllResourcesByVisibility method instead, and so the getLocalPendingRequests method and the new constructor in ContainerLocalizationRequestEvent is not needed The problem with getAllResourcesByVisibility, is it gets all resources. I just need the pending resources... So if you are ok with it, Id like to keep it as is.. > Core changes in NodeManager to support for upgrade and rollback of Containers > ----------------------------------------------------------------------------- > > Key: YARN-5620 > URL: https://issues.apache.org/jira/browse/YARN-5620 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Arun Suresh > Assignee: Arun Suresh > Attachments: YARN-5620.001.patch, YARN-5620.002.patch, YARN-5620.003.patch, YARN-5620.004.patch, YARN-5620.005.patch, YARN-5620.006.patch, YARN-5620.007.patch, YARN-5620.008.patch, YARN-5620.009.patch > > > JIRA proposes to modify the ContainerManager (and other core classes) to support upgrade of a running container with a new {{ContainerLaunchContext}} as well as the ability to rollback the upgrade if the container is not able to restart using the new launch Context. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org