Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: yarn-issues@hadoop.apache.org
Date: Thu, 17 Oct 2013 21:53:44 +0000 (UTC)
From: "Omkar Vinit Joshi (JIRA)" <jira@apache.org>
To: yarn-issues@hadoop.apache.org
Message-ID: <JIRA.12669116.1379452062903.83264.1382046824851@arcas>
In-Reply-To: <JIRA.12669116.1379452062903@arcas>
References: <JIRA.12669116.1379452062903@arcas>
Subject: [jira] [Commented] (YARN-1210) During RM restart, RM should start a
 new attempt only when previous attempt exits for real
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/YARN-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798485#comment-13798485 ] 

Omkar Vinit Joshi commented on YARN-1210:
-----------------------------------------

taking it over.

> During RM restart, RM should start a new attempt only when previous attempt exits for real
> ------------------------------------------------------------------------------------------
>
>                 Key: YARN-1210
>                 URL: https://issues.apache.org/jira/browse/YARN-1210
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Jian He
>
> When RM recovers, it can wait for existing AMs to contact RM back and then kill them forcefully before even starting a new AM. Worst case, RM will start a new AppAttempt after waiting for 10 mins ( the expiry interval). This way we'll minimize multiple AMs racing with each other. This can help issues with downstream components like Pig, Hive and Oozie during RM restart.
> In the mean while, new apps will proceed as usual as existing apps wait for recovery.
> This can continue to be useful after work-preserving restart, so that AMs which can properly sync back up with RM can continue to run and those that don't are guaranteed to be killed before starting a new attempt.


--
This message was sent by Atlassian JIRA
(v6.1#6144)