Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 9DE99200B9D for ; Wed, 28 Sep 2016 21:38:32 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 9B645160AD3; Wed, 28 Sep 2016 19:38:32 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id E101F160AB8 for ; Wed, 28 Sep 2016 21:38:31 +0200 (CEST) Received: (qmail 89853 invoked by uid 500); 28 Sep 2016 19:36:24 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 89841 invoked by uid 99); 28 Sep 2016 19:36:23 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Sep 2016 19:36:23 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 9ED1E2C2A6C for ; Wed, 28 Sep 2016 19:36:20 +0000 (UTC) Date: Wed, 28 Sep 2016 19:36:20 +0000 (UTC) From: "Daniel Templeton (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-5685) Non-embedded HA failover is broken MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 28 Sep 2016 19:38:32 -0000 [ https://issues.apache.org/jira/browse/YARN-5685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15530642#comment-15530642 ] Daniel Templeton commented on YARN-5685: ---------------------------------------- The issue is more than that change from YARN-4559. I'm still digging, but even with that issue resolved the RMs are still all stuck in standby because the state store isn't started until the RM transitions to active, but it doesn't transition to active unless the state store is started. > Non-embedded HA failover is broken > ---------------------------------- > > Key: YARN-5685 > URL: https://issues.apache.org/jira/browse/YARN-5685 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 2.9.0, 3.0.0-alpha1 > Reporter: Daniel Templeton > Assignee: Daniel Templeton > Priority: Critical > > YARN-4559 broke RM HA when embedded automatic failover is disabled. The {{ZKRMStateStore}} will now only start its monitoring thread when automatic failover not enabled (which is patently useless). I presume the intended change was to have the monitoring thread started when automatic failover is not *embedded*. > If HA is enabled with automatic failover enabled and embedded failover disabled, all RMs all come up in standby state. To make one of them active, the {{--forcemanual}} flag must be used when manually triggering the state change. Should the active go down, the standby will not become active and must be manually transitioned with the {{--forcemanual}} flag. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org