Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 09D0C10688 for ; Mon, 14 Apr 2014 16:36:32 +0000 (UTC) Received: (qmail 4242 invoked by uid 500); 14 Apr 2014 16:35:09 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 3116 invoked by uid 500); 14 Apr 2014 16:34:28 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 2826 invoked by uid 99); 14 Apr 2014 16:34:17 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Apr 2014 16:34:17 +0000 Date: Mon, 14 Apr 2014 16:34:17 +0000 (UTC) From: "Jason Lowe (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Assigned] (YARN-1337) Recover active container state upon nodemanager restart MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe reassigned YARN-1337: -------------------------------- Assignee: Jason Lowe > Recover active container state upon nodemanager restart > ------------------------------------------------------- > > Key: YARN-1337 > URL: https://issues.apache.org/jira/browse/YARN-1337 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager > Affects Versions: 2.3.0 > Reporter: Jason Lowe > Assignee: Jason Lowe > > To support work-preserving NM restart we need to recover the state of the containers that were active when the nodemanager went down. This includes informing the RM of containers that have exited in the interim and a strategy for dealing with the exit codes from those containers along with how to reacquire the active containers and determine their exit codes when they terminate. -- This message was sent by Atlassian JIRA (v6.2#6252)