From issues-return-328953-archive-asf-public=cust-asf.ponee.io@hbase.apache.org Wed Jan 10 15:23:16 2018 Return-Path: X-Original-To: archive-asf-public@eu.ponee.io Delivered-To: archive-asf-public@eu.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by mx-eu-01.ponee.io (Postfix) with ESMTP id 1F0CE18076D for ; Wed, 10 Jan 2018 15:23:16 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 0FB08160C23; Wed, 10 Jan 2018 14:23:16 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 51E5D160C2E for ; Wed, 10 Jan 2018 15:23:15 +0100 (CET) Received: (qmail 501 invoked by uid 500); 10 Jan 2018 14:23:09 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 486 invoked by uid 99); 10 Jan 2018 14:23:09 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Jan 2018 14:23:09 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 02AA21809C6 for ; Wed, 10 Jan 2018 14:23:09 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -108.711 X-Spam-Level: X-Spam-Status: No, score=-108.711 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id qwu5rZPPHAf2 for ; Wed, 10 Jan 2018 14:23:07 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 93AAA5F36C for ; Wed, 10 Jan 2018 14:23:06 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id D484CE00A7 for ; Wed, 10 Jan 2018 14:23:05 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id CC435274DD for ; Wed, 10 Jan 2018 14:23:03 +0000 (UTC) Date: Wed, 10 Jan 2018 14:23:03 +0000 (UTC) From: "stack (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-19694) The initialization order for a fresh cluster is incorrect MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-19694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16320317#comment-16320317 ] stack commented on HBASE-19694: ------------------------------- .004 addresses review up on rb. .003 had four timeouts. I can't make them timeout locally. They don't look related. TestHRegionWithInMemoryFlush TestFlushWithThroughputController TestSeekOptimizations TestRegionServerAbort Started up two new builds. > The initialization order for a fresh cluster is incorrect > --------------------------------------------------------- > > Key: HBASE-19694 > URL: https://issues.apache.org/jira/browse/HBASE-19694 > Project: HBase > Issue Type: Bug > Reporter: Duo Zhang > Assignee: stack > Priority: Critical > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-19694.branch-2.001.patch, HBASE-19694.branch-2.002.patch, HBASE-19694.branch-2.003.patch, HBASE-19694.branch-2.004.patch > > > The cluster id will set once we become the active master in finishActiveMasterInitialization, but the blockUntilBecomingActiveMaster and finishActiveMasterInitialization are both called in a thread to make the constructor of HMaster return without blocking. And since HMaster itself is also a HRegionServer, it will create a Connection and then start calling reportForDuty. And when creating the ConnectionImplementation, we will read the cluster id from zk, but the cluster id may have not been set yet since it is set in another thread, we will get an exception and use the default cluster id instead. > I always get this when running UTs which will start a mini cluster > {noformat} > 2018-01-03 15:16:37,916 WARN [M:0;zhangduo-ubuntu:32848] client.ConnectionImplementation(528): Retrieve cluster id failed > java.util.concurrent.ExecutionException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/hbaseid > at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) > at org.apache.hadoop.hbase.client.ConnectionImplementation.retrieveClusterId(ConnectionImplementation.java:526) > at org.apache.hadoop.hbase.client.ConnectionImplementation.(ConnectionImplementation.java:286) > at org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.(ConnectionUtils.java:141) > at org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.(ConnectionUtils.java:137) > at org.apache.hadoop.hbase.client.ConnectionUtils.createShortCircuitConnection(ConnectionUtils.java:185) > at org.apache.hadoop.hbase.regionserver.HRegionServer.createClusterConnection(HRegionServer.java:781) > at org.apache.hadoop.hbase.regionserver.HRegionServer.setupClusterConnection(HRegionServer.java:812) > at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:827) > at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:938) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:550) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/hbaseid > at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$ZKTask$1.exec(ReadOnlyZKClient.java:163) > at org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:311) > ... 1 more > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)