Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: yarn-issues@hadoop.apache.org
Date: Tue, 12 Apr 2016 06:50:25 +0000 (UTC)
From: "sandflee (JIRA)" <jira@apache.org>
To: yarn-issues@hadoop.apache.org
Message-ID: <JIRA.12742540.1411063670000.203385.1460443825637@Atlassian.JIRA>
In-Reply-To: <JIRA.12742540.1411063670000@Atlassian.JIRA>
References: <JIRA.12742540.1411063670000@Atlassian.JIRA>
 <JIRA.12742540.1411063670041@arcas>
Subject: [jira] [Commented] (YARN-2567) Add a percentage-node threshold for
 RM to wait for new allocations after restart/failover
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/YARN-2567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236701#comment-15236701 ] 

sandflee commented on YARN-2567:
--------------------------------

The main idea is to lazily store NM status, if RM failover, recover NM status
RUNNING/UNHEALTHY/DECOMMISSIONNING state,  recover RMNode to NEW state, register a timer and wait for the register to become active.
LOST/DECOMMISSIONED/SHUTDOWN state ,  recover to corresponding finished state.


> Add a percentage-node threshold for RM to wait for new allocations after restart/failover
> -----------------------------------------------------------------------------------------
>
>                 Key: YARN-2567
>                 URL: https://issues.apache.org/jira/browse/YARN-2567
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>
> This is the remaining part of YARN-2001 - to halt allocations after restart till x% of nodes sync back with the RM. This is useful for avoiding bad scheduling during the time the nodes are still joining back after a restart/failover.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)