From dev-return-72338-archive-asf-public=cust-asf.ponee.io@hbase.apache.org  Sat Jan 19 01:26:06 2019
Return-Path: <dev-return-72338-archive-asf-public=cust-asf.ponee.io@hbase.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 872F8180647
	for <archive-asf-public@cust-asf.ponee.io>; Sat, 19 Jan 2019 01:26:05 +0100 (CET)
Received: (qmail 66205 invoked by uid 500); 19 Jan 2019 00:26:04 -0000
Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:dev-help@hbase.apache.org>
List-Unsubscribe: <mailto:dev-unsubscribe@hbase.apache.org>
List-Post: <mailto:dev@hbase.apache.org>
List-Id: <dev.hbase.apache.org>
Reply-To: dev@hbase.apache.org
Delivered-To: mailing list dev@hbase.apache.org
Received: (qmail 66152 invoked by uid 99); 19 Jan 2019 00:26:04 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 19 Jan 2019 00:26:04 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 17534CC7DB
	for <dev@hbase.apache.org>; Sat, 19 Jan 2019 00:26:04 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: -110.301
X-Spam-Level:
X-Spam-Status: No, score=-110.301 tagged_above=-999 required=6.31
	tests=[ENV_AND_HDR_SPF_MATCH=-0.5, RCVD_IN_DNSWL_MED=-2.3,
	SPF_PASS=-0.001, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100]
	autolearn=disabled
Received: from mx1-lw-eu.apache.org ([10.40.0.8])
	by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024)
	with ESMTP id oAeuqNkBoRtq for <dev@hbase.apache.org>;
	Sat, 19 Jan 2019 00:26:02 +0000 (UTC)
Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139])
	by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 2DA1F5FB63
	for <dev@hbase.apache.org>; Sat, 19 Jan 2019 00:26:02 +0000 (UTC)
Received: from jira-lw-us.apache.org (unknown [207.244.88.139])
	by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 224CAE2699
	for <dev@hbase.apache.org>; Sat, 19 Jan 2019 00:26:01 +0000 (UTC)
Received: from jira-lw-us.apache.org (localhost [127.0.0.1])
	by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 8CFD82566A
	for <dev@hbase.apache.org>; Sat, 19 Jan 2019 00:26:00 +0000 (UTC)
Date: Sat, 19 Jan 2019 00:26:00 +0000 (UTC)
From: "Sergey Shelukhin (JIRA)" <jira@apache.org>
To: dev@hbase.apache.org
Message-ID: <JIRA.13210502.1547857532000.135780.1547857560565@Atlassian.JIRA>
In-Reply-To: <JIRA.13210502.1547857532000@Atlassian.JIRA>
References: <JIRA.13210502.1547857532000@Atlassian.JIRA> <JIRA.13210502.1547857532542@jira-lw-us.apache.org>
Subject: [jira] [Created] (HBASE-21744) timeout for server list refresh
 calls
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394

Sergey Shelukhin created HBASE-21744:
----------------------------------------

             Summary: timeout for server list refresh calls 
                 Key: HBASE-21744
                 URL: https://issues.apache.org/jira/browse/HBASE-21744
             Project: HBase
          Issue Type: Bug
            Reporter: Sergey Shelukhin


Not sure why yet, but we are seeing the case when cluster is in overall a bad state, where after RS dies and deletes its znode, the notification looks like it's lost, so the master doesn't detect the failure. ZK itself appears to be healthy and doesn't report anything special.
After some other change is made to the server list, master rescans the list and picks up the stale notification. Might make sense to add a config that would trigger the refresh if it hasn't happened for a while (e.g. 1 minute).


--
This message was sent by Atlassian JIRA
(v7.6.3#76005)