hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ryan rawson (JIRA)" <j...@apache.org>
Subject [jira] Created: (HBASE-1232) zookeeper client wont reconnect if there is a problem
Date Mon, 02 Mar 2009 23:47:56 GMT
zookeeper client wont reconnect if there is a problem
-----------------------------------------------------

                 Key: HBASE-1232
                 URL: https://issues.apache.org/jira/browse/HBASE-1232
             Project: Hadoop HBase
          Issue Type: Bug
         Environment: java 1.7, zookeeper 3.0.1
            Reporter: ryan rawson
            Assignee: Jean-Daniel Cryans
             Fix For: 0.20.0


my regionserver got wedged:
2009-03-02 15:43:30,938 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to
create /hbase:
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
for /hbase
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:87)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:35)
        at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:482)
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.ensureExists(ZooKeeperWrapper.java:219)
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.ensureParentExists(ZooKeeperWrapper.java:240)
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.checkOutOfSafeMode(ZooKeeperWrapper.java:328)
        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:783)
        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:468)
        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:443)
        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:518)
        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:477)
        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:450)
        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:295)
        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocationForRowWithRetries(HConnectionManager.java:919)
        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:950)
        at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1370)
        at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1314)
        at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1294)
        at org.apache.hadoop.hbase.RegionHistorian.add(RegionHistorian.java:237)
        at org.apache.hadoop.hbase.RegionHistorian.add(RegionHistorian.java:216)
        at org.apache.hadoop.hbase.RegionHistorian.addRegionSplit(RegionHistorian.java:174)
        at org.apache.hadoop.hbase.regionserver.HRegion.splitRegion(HRegion.java:607)
        at org.apache.hadoop.hbase.regionserver.CompactSplitThread.split(CompactSplitThread.java:174)
        at org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSplitThread.java:107)


this message repeats over and over.  

Looking at the code in question:
  private boolean ensureExists(final String znode) {
    try {
      zooKeeper.create(znode, new byte[0],
                       Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
      LOG.debug("Created ZNode " + znode);
      return true;
    } catch (KeeperException.NodeExistsException e) {
      return true;      // ok, move on.
    } catch (KeeperException.NoNodeException e) {
      return ensureParentExists(znode) && ensureExists(znode);
    } catch (KeeperException e) {
      LOG.warn("Failed to create " + znode + ":", e);
    } catch (InterruptedException e) {
      LOG.warn("Failed to create " + znode + ":", e);
    }
    return false;
  }

We need to catch this exception specifically and reopen the ZK connection.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message