Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 68513200BAE for ; Thu, 13 Oct 2016 16:11:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 670EC160AE4; Thu, 13 Oct 2016 14:11:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id B79E2160AF8 for ; Thu, 13 Oct 2016 16:11:21 +0200 (CEST) Received: (qmail 58302 invoked by uid 500); 13 Oct 2016 14:11:20 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 58138 invoked by uid 99); 13 Oct 2016 14:11:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Oct 2016 14:11:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id A98ED2C4C77 for ; Thu, 13 Oct 2016 14:11:20 +0000 (UTC) Date: Thu, 13 Oct 2016 14:11:20 +0000 (UTC) From: "Heng Chen (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-16807) RegionServer will fail to report new active Hmaster until HMaster/RegionServer failover MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 13 Oct 2016 14:11:22 -0000 [ https://issues.apache.org/jira/browse/HBASE-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15572030#comment-15572030 ] Heng Chen commented on HBASE-16807: ----------------------------------- Will you upload patch for branch-1.1 and branch-1.2 ? > RegionServer will fail to report new active Hmaster until HMaster/RegionServer failover > --------------------------------------------------------------------------------------- > > Key: HBASE-16807 > URL: https://issues.apache.org/jira/browse/HBASE-16807 > Project: HBase > Issue Type: Bug > Components: regionserver > Reporter: Pankaj Kumar > Assignee: Pankaj Kumar > Fix For: 2.0.0 > > Attachments: HBASE-16807-0.98.patch, HBASE-16807-branch-1.3.patch, HBASE-16807-branch-1.patch, HBASE-16807.patch > > > It's little weird, but it happened in the product environment that few RegionServer missed master znode create notification on master failover. In that case ZooKeeperNodeTracker will not refresh the cached data and MasterAddressTracker will always return old active HM detail to Region server on ServiceException. > Though We create region server stub on failure but without refreshing the MasterAddressTracker data. > In HRegionServer.createRegionServerStatusStub() > {code} > boolean refresh = false; // for the first time, use cached data > RegionServerStatusService.BlockingInterface intf = null; > boolean interrupted = false; > try { > while (keepLooping()) { > sn = this.masterAddressTracker.getMasterAddress(refresh); > if (sn == null) { > if (!keepLooping()) { > // give up with no connection. > LOG.debug("No master found and cluster is stopped; bailing out"); > return null; > } > if (System.currentTimeMillis() > (previousLogTime + 1000)) { > LOG.debug("No master found; retry"); > previousLogTime = System.currentTimeMillis(); > } > refresh = true; // let's try pull it from ZK directly > if (sleep(200)) { > interrupted = true; > } > continue; > } > {code} > Here we refresh node only when 'sn' is NULL otherwise it will use same cached data. > So in above case RegionServer will never report active HMaster successfully until HMaster failover or RegionServer restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)