hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-11922) copyQueuesFromRSUsingMulti may fail to clean up properly if zk.useMulti is true and there are orphaned queues
Date Tue, 09 Sep 2014 20:49:28 GMT
Andrew Purtell created HBASE-11922:

             Summary: copyQueuesFromRSUsingMulti may fail to clean up properly if zk.useMulti
is true and there are orphaned queues
                 Key: HBASE-11922
                 URL: https://issues.apache.org/jira/browse/HBASE-11922
             Project: HBase
          Issue Type: Bug
    Affects Versions: 0.98.6
            Reporter: Andrew Purtell
            Priority: Minor

To reproduce, set hbase.zookeeper.useMulti to true in site configuration, start up an all-localhost
cluster, create a table with a CF with a replication scope of 1, add a peer (doesn't have
to be a live endpoint), remove the peer, restart the regionserver.  Observe:

2014-09-09 13:39:23,497 WARN  [ReplicationExecutor-0] replication.ReplicationQueuesZKImpl:
Got exception in copyQueuesFromRSUsingMulti: 
org.apache.zookeeper.KeeperException$NotEmptyException: KeeperErrorCode = Directory not empty
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:125)
	at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
	at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
	at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.multi(RecoverableZooKeeper.java:620)
	at org.apache.hadoop.hbase.zookeeper.ZKUtil.multiOrSequential(ZKUtil.java:1530)
	at org.apache.hadoop.hbase.replication.ReplicationQueuesZKImpl.copyQueuesFromRSUsingMulti(ReplicationQueuesZKImpl.java:335)
	at org.apache.hadoop.hbase.replication.ReplicationQueuesZKImpl.claimQueues(ReplicationQueuesZKImpl.java:167)
	at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager$NodeFailoverWorker.run(ReplicationSourceManager.java:520)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

This is because there was an orphaned queue. 

If ZK rolls back state after a failed multi (need to check, but let's assume so for now),
then other ops bundled into the multi-op by copyQueuesFromRSUsingMulti will be rolled back,
which might not be what we want.

This message was sent by Atlassian JIRA

View raw message