Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Thu, 24 May 2012 14:24:55 +0000 (UTC)
From: "ramkrishna.s.vasudevan (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <232145010.281.1337869495752.JavaMail.jiratomcat@issues-vm>
In-Reply-To: <1764504354.126.1337866443160.JavaMail.jiratomcat@issues-vm>
Subject: [jira] [Commented] (HBASE-6088)  Region splitting not happened for
 long time due to ZK exception while creating RS_ZK_SPLITTING node
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HBASE-6088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13282518#comment-13282518 ] 

ramkrishna.s.vasudevan commented on HBASE-6088:
-----------------------------------------------

While we start doing the split, there are two steps in zk node creation.
-> Create the node
-> Write the data RS_ZK_SPLITTING into it.
Now after both the steps are completed we make an journal entry.  
Now if writing the data fails even on rollback we are not able to clean the node as we don't know the current journal entry.  
                
>  Region splitting not happened for long time due to ZK exception while creating RS_ZK_SPLITTING node
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-6088
>                 URL: https://issues.apache.org/jira/browse/HBASE-6088
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.0
>            Reporter: Gopinathan A
>             Fix For: 0.94.1
>
>
> Region splitting not happened for long time due to ZK exception while creating RS_ZK_SPLITTING node
> {noformat}
> 2012-05-24 01:45:41,363 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 26668ms for sessionid 0x1377a75f41d0012, closing socket connection and attempting reconnect
> 2012-05-24 01:45:41,464 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/unassigned/bd1079bf948c672e493432020dc0e144
> {noformat}
> {noformat}
> 2012-05-24 01:45:43,300 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: cleanupCurrentWriter  waiting for transactions to get synced  total 189377 synced till here 189365
> 2012-05-24 01:45:48,474 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup of failed split of ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.; Failed setting SPLITTING znode on ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.
> java.io.IOException: Failed setting SPLITTING znode on ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.
> 	at org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:242)
> 	at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:450)
> 	at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:67)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> 	at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /hbase/unassigned/bd1079bf948c672e493432020dc0e144
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> 	at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
> 	at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:321)
> 	at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:659)
> 	at org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:811)
> 	at org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:747)
> 	at org.apache.hadoop.hbase.regionserver.SplitTransaction.transitionNodeSplitting(SplitTransaction.java:919)
> 	at org.apache.hadoop.hbase.regionserver.SplitTransaction.createNodeSplitting(SplitTransaction.java:869)
> 	at org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:239)
> 	... 5 more
> 2012-05-24 01:45:48,476 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Successful rollback of failed split of ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.
> {noformat}
> {noformat}
> 2012-05-24 01:47:28,141 ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node /hbase/unassigned/bd1079bf948c672e493432020dc0e144 already exists and this is not a retry
> 2012-05-24 01:47:28,142 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup of failed split of ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.; Failed create of ephemeral /hbase/unassigned/bd1079bf948c672e493432020dc0e144
> java.io.IOException: Failed create of ephemeral /hbase/unassigned/bd1079bf948c672e493432020dc0e144
> 	at org.apache.hadoop.hbase.regionserver.SplitTransaction.createNodeSplitting(SplitTransaction.java:865)
> 	at org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:239)
> 	at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:450)
> 	at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:67)
> {noformat}
> Due to the above exception, region splitting was failing contineously more than 5hrs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira