Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5C44411C9A for ; Thu, 21 Aug 2014 22:10:12 +0000 (UTC) Received: (qmail 78073 invoked by uid 500); 21 Aug 2014 22:10:12 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 78032 invoked by uid 500); 21 Aug 2014 22:10:12 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 78016 invoked by uid 99); 21 Aug 2014 22:10:12 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Aug 2014 22:10:12 +0000 Date: Thu, 21 Aug 2014 22:10:12 +0000 (UTC) From: "Lars Hofhansl (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-9746) RegionServer can't start when replication tries to replicate to an unknown host MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-9746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106036#comment-14106036 ] Lars Hofhansl commented on HBASE-9746: -------------------------------------- Agreed. 0.94 and 0.98 have this problem. trunk (2.0.0) actually stays up but has another problem: After the initial attempt to connect to the slave cluster the region server just gives up and never retries, and thus silently (except for log messages) never replicates to those clusters. (Haven't checked the 1.0 branch) Note that the issue here is the ZK ensemble is not reachable. I think in 0.94 and 0.98 we can fix this by pulling the connectToPeers() logic into the portion that is retried in a loop. Testing this is hard, though. > RegionServer can't start when replication tries to replicate to an unknown host > ------------------------------------------------------------------------------- > > Key: HBASE-9746 > URL: https://issues.apache.org/jira/browse/HBASE-9746 > Project: HBase > Issue Type: Bug > Affects Versions: 0.94.12 > Reporter: Lars Hofhansl > Priority: Minor > Fix For: 0.99.0, 2.0.0, 0.98.7, 0.94.24 > > > Just ran into this: > {code} > 13/10/11 00:37:02 [regionserver60020] WARN zookeeper.ZKConfig(204): java.net.UnknownHostException: : Name or service not known > at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) > at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:894) > at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1286) > at java.net.InetAddress.getAllByName0(InetAddress.java:1239) > at java.net.InetAddress.getAllByName(InetAddress.java:1155) > at java.net.InetAddress.getAllByName(InetAddress.java:1091) > at java.net.InetAddress.getByName(InetAddress.java:1041) > at org.apache.hadoop.hbase.zookeeper.ZKConfig.getZKQuorumServersString(ZKConfig.java:201) > at org.apache.hadoop.hbase.zookeeper.ZKConfig.getZKQuorumServersString(ZKConfig.java:245) > at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:147) > at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:127) > at org.apache.hadoop.hbase.replication.ReplicationPeer.reloadZkWatcher(ReplicationPeer.java:170) > at org.apache.hadoop.hbase.replication.ReplicationPeer.(ReplicationPeer.java:69) > at org.apache.hadoop.hbase.replication.ReplicationZookeeper.getPeer(ReplicationZookeeper.java:343) > at org.apache.hadoop.hbase.replication.ReplicationZookeeper.connectToPeer(ReplicationZookeeper.java:308) > at org.apache.hadoop.hbase.replication.ReplicationZookeeper.connectExistingPeers(ReplicationZookeeper.java:189) > at org.apache.hadoop.hbase.replication.ReplicationZookeeper.(ReplicationZookeeper.java:156) > at org.apache.hadoop.hbase.replication.regionserver.Replication.initialize(Replication.java:89) > at org.apache.hadoop.hbase.regionserver.HRegionServer.newReplicationInstance(HRegionServer.java:3986) > at org.apache.hadoop.hbase.regionserver.HRegionServer.createNewReplicationInstance(HRegionServer.java:3955) > at org.apache.hadoop.hbase.regionserver.HRegionServer.setupWALAndReplication(HRegionServer.java:1412) > at org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:1096) > at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:749) > at java.lang.Thread.run(Thread.java:722) > 13/10/11 00:37:02 [regionserver60020] ERROR zookeeper.ZKConfig(210): no valid quorum servers found in zoo.cfg > 13/10/11 00:37:02 [regionserver60020] WARN regionserver.HRegionServer(1108): Exception in region server : > java.io.IOException: Unable to determine ZooKeeper ensemble > at org.apache.hadoop.hbase.zookeeper.ZKUtil.connect(ZKUtil.java:116) > at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:153) > at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:127) > at org.apache.hadoop.hbase.replication.ReplicationPeer.reloadZkWatcher(ReplicationPeer.java:170) > at org.apache.hadoop.hbase.replication.ReplicationPeer.(ReplicationPeer.java:69) > at org.apache.hadoop.hbase.replication.ReplicationZookeeper.getPeer(ReplicationZookeeper.java:343) > at org.apache.hadoop.hbase.replication.ReplicationZookeeper.connectToPeer(ReplicationZookeeper.java:308) > at org.apache.hadoop.hbase.replication.ReplicationZookeeper.connectExistingPeers(ReplicationZookeeper.java:189) > at org.apache.hadoop.hbase.replication.ReplicationZookeeper.(ReplicationZookeeper.java:156) > at org.apache.hadoop.hbase.replication.regionserver.Replication.initialize(Replication.java:89) > at org.apache.hadoop.hbase.regionserver.HRegionServer.newReplicationInstance(HRegionServer.java:3986) > at org.apache.hadoop.hbase.regionserver.HRegionServer.createNewReplicationInstance(HRegionServer.java:3955) > at org.apache.hadoop.hbase.regionserver.HRegionServer.setupWALAndReplication(HRegionServer.java:1412) > at org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:1096) > at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:749) > at java.lang.Thread.run(Thread.java:722) > 13/10/11 00:37:02 [regionserver60020] INFO regionserver.HRegionServer(1823): STOPPED: Failed initialization > 13/10/11 00:37:02 [regionserver60020] ERROR regionserver.HRegionServer(1228): Failed init > java.io.IOException: Unable to determine ZooKeeper ensemble > at org.apache.hadoop.hbase.zookeeper.ZKUtil.connect(ZKUtil.java:116) > at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:153) > at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:127) > at org.apache.hadoop.hbase.replication.ReplicationPeer.reloadZkWatcher(ReplicationPeer.java:170) > at org.apache.hadoop.hbase.replication.ReplicationPeer.(ReplicationPeer.java:69) > at org.apache.hadoop.hbase.replication.ReplicationZookeeper.getPeer(ReplicationZookeeper.java:343) > at org.apache.hadoop.hbase.replication.ReplicationZookeeper.connectToPeer(ReplicationZookeeper.java:308) > at org.apache.hadoop.hbase.replication.ReplicationZookeeper.connectExistingPeers(ReplicationZookeeper.java:189) > at org.apache.hadoop.hbase.replication.ReplicationZookeeper.(ReplicationZookeeper.java:156) > at org.apache.hadoop.hbase.replication.regionserver.Replication.initialize(Replication.java:89) > at org.apache.hadoop.hbase.regionserver.HRegionServer.newReplicationInstance(HRegionServer.java:3986) > at org.apache.hadoop.hbase.regionserver.HRegionServer.createNewReplicationInstance(HRegionServer.java:3955) > at org.apache.hadoop.hbase.regionserver.HRegionServer.setupWALAndReplication(HRegionServer.java:1412) > at org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:1096) > at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:749) > at java.lang.Thread.run(Thread.java:722) > 13/10/11 00:37:02 [regionserver60020] FATAL regionserver.HRegionServer(1898): ABORTING region server XXXXXXXX,60020,1381451821216: Unhandled exception: Unable to determine ZooKeeper ensemble > java.io.IOException: Unable to determine ZooKeeper ensemble > at org.apache.hadoop.hbase.zookeeper.ZKUtil.connect(ZKUtil.java:116) > at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:153) > at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:127) > at org.apache.hadoop.hbase.replication.ReplicationPeer.reloadZkWatcher(ReplicationPeer.java:170) > at org.apache.hadoop.hbase.replication.ReplicationPeer.(ReplicationPeer.java:69) > at org.apache.hadoop.hbase.replication.ReplicationZookeeper.getPeer(ReplicationZookeeper.java:343) > at org.apache.hadoop.hbase.replication.ReplicationZookeeper.connectToPeer(ReplicationZookeeper.java:308) > at org.apache.hadoop.hbase.replication.ReplicationZookeeper.connectExistingPeers(ReplicationZookeeper.java:189) > at org.apache.hadoop.hbase.replication.ReplicationZookeeper.(ReplicationZookeeper.java:156) > at org.apache.hadoop.hbase.replication.regionserver.Replication.initialize(Replication.java:89) > at org.apache.hadoop.hbase.regionserver.HRegionServer.newReplicationInstance(HRegionServer.java:3986) > at org.apache.hadoop.hbase.regionserver.HRegionServer.createNewReplicationInstance(HRegionServer.java:3955) > at org.apache.hadoop.hbase.regionserver.HRegionServer.setupWALAndReplication(HRegionServer.java:1412) > at org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:1096) > at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:749) > at java.lang.Thread.run(Thread.java:722) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)