Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 27920107B2 for ; Wed, 15 Jan 2014 09:47:25 +0000 (UTC) Received: (qmail 80525 invoked by uid 500); 15 Jan 2014 09:47:21 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 80301 invoked by uid 500); 15 Jan 2014 09:47:20 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 80287 invoked by uid 99); 15 Jan 2014 09:47:19 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Jan 2014 09:47:19 +0000 Date: Wed, 15 Jan 2014 09:47:19 +0000 (UTC) From: "chendihao (JIRA)" To: dev@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HBASE-10345) HMaster should not serve when disconnected with ZooKeeper MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 chendihao created HBASE-10345: --------------------------------- Summary: HMaster should not serve when disconnected with ZooKeeper Key: HBASE-10345 URL: https://issues.apache.org/jira/browse/HBASE-10345 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.94.3 Reporter: chendihao Refer to HBASE-9468(Previous active master can still serves RPC request when it is trying recovering expired zk session), we can fail fast to avoid existing double masters at the same time. But this problem may occur before session expired. When receive Disconnected event, we can't make sure of that this active master can communicate with zk later. And it doesn't know whether backup master has become the new active master or not until it receives Expired event(which may lose forever). During this unsure-who-is-active-master period, the current active master should not serve(maybe turn off RpcServer). Here is the statement from "ZooKeeper Distributed Process Coordination" P101 {quote} If the developer is not careful, the old leader will continue to act as a leader and may take actions that conflict with those of the new leader. For this reason, when a process receives a Disconnected event, the process should suspend actions taken as a leader until it reconnects. Normally this reconnect happens very quickly. {quote} So it's equally necessary to handle Disconnected event and Expired event. -- This message was sent by Atlassian JIRA (v6.1.5#6160)