Return-Path: Delivered-To: apmail-hadoop-hbase-dev-archive@minotaur.apache.org Received: (qmail 35775 invoked from network); 30 Mar 2009 01:17:12 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 30 Mar 2009 01:17:12 -0000 Received: (qmail 54267 invoked by uid 500); 30 Mar 2009 01:17:11 -0000 Delivered-To: apmail-hadoop-hbase-dev-archive@hadoop.apache.org Received: (qmail 54223 invoked by uid 500); 30 Mar 2009 01:17:11 -0000 Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-dev@hadoop.apache.org Delivered-To: mailing list hbase-dev@hadoop.apache.org Received: (qmail 54213 invoked by uid 99); 30 Mar 2009 01:17:11 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Mar 2009 01:17:11 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Mar 2009 01:17:11 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id D0A96234C003 for ; Sun, 29 Mar 2009 18:16:50 -0700 (PDT) Message-ID: <238735325.1238375810839.JavaMail.jira@brutus> Date: Sun, 29 Mar 2009 18:16:50 -0700 (PDT) From: "Andrew Purtell (JIRA)" To: hbase-dev@hadoop.apache.org Subject: [jira] Updated: (HBASE-1232) zookeeper client wont reconnect if there is a problem In-Reply-To: <836995834.1236037676350.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-1232: ---------------------------------- Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed to trunk. > zookeeper client wont reconnect if there is a problem > ----------------------------------------------------- > > Key: HBASE-1232 > URL: https://issues.apache.org/jira/browse/HBASE-1232 > Project: Hadoop HBase > Issue Type: Bug > Environment: java 1.7, zookeeper 3.0.1 > Reporter: ryan rawson > Assignee: Nitay Joffe > Priority: Critical > Fix For: 0.20.0 > > Attachments: hbase-1232-v2.patch, hbase-1232.patch > > > my regionserver got wedged: > 2009-03-02 15:43:30,938 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to create /hbase: > org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase > at org.apache.zookeeper.KeeperException.create(KeeperException.java:87) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:35) > at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:482) > at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.ensureExists(ZooKeeperWrapper.java:219) > at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.ensureParentExists(ZooKeeperWrapper.java:240) > at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.checkOutOfSafeMode(ZooKeeperWrapper.java:328) > at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:783) > at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:468) > at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:443) > at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:518) > at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:477) > at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:450) > at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:295) > at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocationForRowWithRetries(HConnectionManager.java:919) > at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:950) > at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1370) > at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1314) > at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1294) > at org.apache.hadoop.hbase.RegionHistorian.add(RegionHistorian.java:237) > at org.apache.hadoop.hbase.RegionHistorian.add(RegionHistorian.java:216) > at org.apache.hadoop.hbase.RegionHistorian.addRegionSplit(RegionHistorian.java:174) > at org.apache.hadoop.hbase.regionserver.HRegion.splitRegion(HRegion.java:607) > at org.apache.hadoop.hbase.regionserver.CompactSplitThread.split(CompactSplitThread.java:174) > at org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSplitThread.java:107) > this message repeats over and over. > Looking at the code in question: > private boolean ensureExists(final String znode) { > try { > zooKeeper.create(znode, new byte[0], > Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT); > LOG.debug("Created ZNode " + znode); > return true; > } catch (KeeperException.NodeExistsException e) { > return true; // ok, move on. > } catch (KeeperException.NoNodeException e) { > return ensureParentExists(znode) && ensureExists(znode); > } catch (KeeperException e) { > LOG.warn("Failed to create " + znode + ":", e); > } catch (InterruptedException e) { > LOG.warn("Failed to create " + znode + ":", e); > } > return false; > } > We need to catch this exception specifically and reopen the ZK connection. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.