Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1EEBA11B82 for ; Wed, 14 May 2014 03:17:16 +0000 (UTC) Received: (qmail 52946 invoked by uid 500); 12 May 2014 15:17:16 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 52859 invoked by uid 500); 12 May 2014 15:17:16 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 52805 invoked by uid 99); 12 May 2014 15:17:15 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 May 2014 15:17:15 +0000 Date: Mon, 12 May 2014 15:17:15 +0000 (UTC) From: "Jason Lowe (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (YARN-1337) Recover containers upon nodemanager restart MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-1337: ----------------------------- Description: To support work-preserving NM restart we need to recover the state of the containers when the nodemanager went down. This includes informing the RM of containers that have exited in the interim and a strategy for dealing with the exit codes from those containers along with how to reacquire the active containers and determine their exit codes when they terminate. The state of finished containers also needs to be recovered. (was: To support work-preserving NM restart we need to recover the state of the containers that were active when the nodemanager went down. This includes informing the RM of containers that have exited in the interim and a strategy for dealing with the exit codes from those containers along with how to reacquire the active containers and determine their exit codes when they terminate.) Summary: Recover containers upon nodemanager restart (was: Recover active container state upon nodemanager restart) Updating headline and description to note that this task also includes recovering the state of finished containers as well. > Recover containers upon nodemanager restart > ------------------------------------------- > > Key: YARN-1337 > URL: https://issues.apache.org/jira/browse/YARN-1337 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager > Affects Versions: 2.3.0 > Reporter: Jason Lowe > Assignee: Jason Lowe > > To support work-preserving NM restart we need to recover the state of the containers when the nodemanager went down. This includes informing the RM of containers that have exited in the interim and a strategy for dealing with the exit codes from those containers along with how to reacquire the active containers and determine their exit codes when they terminate. The state of finished containers also needs to be recovered. -- This message was sent by Atlassian JIRA (v6.2#6252)