Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 8583C200BC7 for ; Thu, 10 Nov 2016 19:02:00 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 847F9160AF7; Thu, 10 Nov 2016 18:02:00 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id D180B160B01 for ; Thu, 10 Nov 2016 19:01:59 +0100 (CET) Received: (qmail 23735 invoked by uid 500); 10 Nov 2016 18:01:59 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 23689 invoked by uid 99); 10 Nov 2016 18:01:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Nov 2016 18:01:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id C88612C4C74 for ; Thu, 10 Nov 2016 18:01:58 +0000 (UTC) Date: Thu, 10 Nov 2016 18:01:58 +0000 (UTC) From: "Jian He (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-5694) ZKRMStateStore should always start its verification thread to prevent accidental state store corruption MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 10 Nov 2016 18:02:00 -0000 [ https://issues.apache.org/jira/browse/YARN-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15654697#comment-15654697 ] Jian He commented on YARN-5694: ------------------------------- bq. If we agree that it's bad to have two RMs accidentally sharing the same state store, If it's in non-HA mode, currently there's no protection in the ZKStore preventing two RMs from sharing the same store. All the ACLs setting related code is only used in HA mode. Essentially, with current patch, I doubt it will get NoAuthException in the verifyThread, without making user change the ACLs manually. So the handling code in this patch will not be triggered with default setting. Maybe I'm wrong, you may try on a real cluster.. bq. why would you not want to catch the issue as early as possible? My point is that first,will this code work as mentioned above. second, if there's no difference in terms of functionality, why do I need to start a thread pinging the zk continuously every few seconds. Of course, I might miss something, you may clarify more... Also, is the use-case mainly about two clusters sharing the same zk-store with the same path ? IMHO, this is not a primary use-case to solve, if user mis-configured, it's user's fault. There are many other places that can go wrong. e.g. if two clusters configure the same path for anything on HDFS. If the use-case is about two RMs sharing the same zk-path in the same cluster with non-HA mode. I think in non-HA mode, the invalid RM will not take workload in the first place, clients, NMs will not switch to that RM if HA is not configured properly. > ZKRMStateStore should always start its verification thread to prevent accidental state store corruption > ------------------------------------------------------------------------------------------------------- > > Key: YARN-5694 > URL: https://issues.apache.org/jira/browse/YARN-5694 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 3.0.0-alpha1 > Reporter: Daniel Templeton > Assignee: Daniel Templeton > Priority: Critical > Labels: oct16-medium > Attachments: YARN-5694.001.patch, YARN-5694.002.patch, YARN-5694.003.patch, YARN-5694.004.patch, YARN-5694.004.patch, YARN-5694.005.patch, YARN-5694.006.patch, YARN-5694.007.patch, YARN-5694.branch-2.7.001.patch, YARN-5694.branch-2.7.002.patch > > > There are two cases. In branch-2.7, the {{ZKRMStateStore.VerifyActiveStatusThread}} is always started, even when using embedded or Curator failover. In branch-2.8, the {{ZKRMStateStore.VerifyActiveStatusThread}} is only started when HA is disabled, which makes no sense. Based on the JIRA that introduced that change (YARN-4559), I believe the intent was to start it only when embedded failover is disabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org