Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2A1101004F for ; Sat, 4 Jan 2014 21:17:53 +0000 (UTC) Received: (qmail 42690 invoked by uid 500); 4 Jan 2014 21:17:53 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 42662 invoked by uid 500); 4 Jan 2014 21:17:53 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 42654 invoked by uid 99); 4 Jan 2014 21:17:53 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 04 Jan 2014 21:17:53 +0000 Date: Sat, 4 Jan 2014 21:17:52 +0000 (UTC) From: "Jian He (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862419#comment-13862419 ] Jian He commented on YARN-1490: ------------------------------- bq. We are better off changing the dispatcher related logic to look up the appId of the container, get the current attempt of that appId and then route the event to the current event Thought about this, this can lead to the race that the new attempt is not yet created in the schedule when AM is restarting, the scheduler is still pointing to the previous died attempt, then the container events are going to be sent the previous died attempt. > RM should optionally not kill all containers when an ApplicationMaster exits > ---------------------------------------------------------------------------- > > Key: YARN-1490 > URL: https://issues.apache.org/jira/browse/YARN-1490 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Vinod Kumar Vavilapalli > Assignee: Jian He > Attachments: YARN-1490.1.patch, YARN-1490.2.patch, YARN-1490.3.patch > > > This is needed to enable work-preserving AM restart. Some apps can chose to reconnect with old running containers, some may not want to. This should be an option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)