Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7D23711857 for ; Fri, 23 May 2014 22:00:05 +0000 (UTC) Received: (qmail 15828 invoked by uid 500); 23 May 2014 22:00:05 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 15765 invoked by uid 500); 23 May 2014 22:00:05 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 15721 invoked by uid 99); 23 May 2014 22:00:05 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 May 2014 22:00:05 +0000 Date: Fri, 23 May 2014 22:00:04 +0000 (UTC) From: "Anubhav Dhoot (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14007768#comment-14007768 ] Anubhav Dhoot commented on YARN-1365: ------------------------------------- The error is RMAppRecoveredTransition leaves it in LAUNCHED and then scheduler executes ATTEMPT_ADDED. I see Jian fixed it in a certain way in YARN-1368. But that only addresses it if its in LAUNCHED. If the state reaches RUNNING before that we still get the error. The option is see is we pass in a flag to AppAttemptAddedSchedulerEvent that tells scheduler not to issue ATTEMPT_ADDED. This will be set in RMAppRecoveredTransition. Lemme know what you think > ApplicationMasterService to allow Register and Unregister of an app that was running before restart > --------------------------------------------------------------------------------------------------- > > Key: YARN-1365 > URL: https://issues.apache.org/jira/browse/YARN-1365 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager > Reporter: Bikas Saha > Assignee: Anubhav Dhoot > Attachments: YARN-1365.001.patch, YARN-1365.002.patch, YARN-1365.003.patch, YARN-1365.initial.patch > > > For an application that was running before restart, the ApplicationMasterService currently throws an exception when the app tries to make the initial register or final unregister call. These should succeed and the RMApp state machine should transition to completed like normal. Unregistration should succeed for an app that the RM considers complete since the RM may have died after saving completion in the store but before notifying the AM that the AM is free to exit. -- This message was sent by Atlassian JIRA (v6.2#6252)