Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 20649CB19 for ; Thu, 24 May 2012 14:24:56 +0000 (UTC) Received: (qmail 69573 invoked by uid 500); 24 May 2012 14:24:56 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 69509 invoked by uid 500); 24 May 2012 14:24:55 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 69500 invoked by uid 99); 24 May 2012 14:24:55 -0000 Received: from issues-vm.apache.org (HELO issues-vm) (140.211.11.160) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 May 2012 14:24:55 +0000 Received: from isssues-vm.apache.org (localhost [127.0.0.1]) by issues-vm (Postfix) with ESMTP id B743214280B for ; Thu, 24 May 2012 14:24:55 +0000 (UTC) Date: Thu, 24 May 2012 14:24:55 +0000 (UTC) From: "ramkrishna.s.vasudevan (JIRA)" To: issues@hbase.apache.org Message-ID: <232145010.281.1337869495752.JavaMail.jiratomcat@issues-vm> In-Reply-To: <1764504354.126.1337866443160.JavaMail.jiratomcat@issues-vm> Subject: [jira] [Commented] (HBASE-6088) Region splitting not happened for long time due to ZK exception while creating RS_ZK_SPLITTING node MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-6088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13282518#comment-13282518 ] ramkrishna.s.vasudevan commented on HBASE-6088: ----------------------------------------------- While we start doing the split, there are two steps in zk node creation. -> Create the node -> Write the data RS_ZK_SPLITTING into it. Now after both the steps are completed we make an journal entry. Now if writing the data fails even on rollback we are not able to clean the node as we don't know the current journal entry. > Region splitting not happened for long time due to ZK exception while creating RS_ZK_SPLITTING node > ---------------------------------------------------------------------------------------------------- > > Key: HBASE-6088 > URL: https://issues.apache.org/jira/browse/HBASE-6088 > Project: HBase > Issue Type: Bug > Affects Versions: 0.94.0 > Reporter: Gopinathan A > Fix For: 0.94.1 > > > Region splitting not happened for long time due to ZK exception while creating RS_ZK_SPLITTING node > {noformat} > 2012-05-24 01:45:41,363 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 26668ms for sessionid 0x1377a75f41d0012, closing socket connection and attempting reconnect > 2012-05-24 01:45:41,464 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/unassigned/bd1079bf948c672e493432020dc0e144 > {noformat} > {noformat} > 2012-05-24 01:45:43,300 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: cleanupCurrentWriter waiting for transactions to get synced total 189377 synced till here 189365 > 2012-05-24 01:45:48,474 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup of failed split of ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.; Failed setting SPLITTING znode on ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144. > java.io.IOException: Failed setting SPLITTING znode on ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144. > at org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:242) > at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:450) > at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:67) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > Caused by: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /hbase/unassigned/bd1079bf948c672e493432020dc0e144 > at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246) > at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:321) > at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:659) > at org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:811) > at org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:747) > at org.apache.hadoop.hbase.regionserver.SplitTransaction.transitionNodeSplitting(SplitTransaction.java:919) > at org.apache.hadoop.hbase.regionserver.SplitTransaction.createNodeSplitting(SplitTransaction.java:869) > at org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:239) > ... 5 more > 2012-05-24 01:45:48,476 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Successful rollback of failed split of ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144. > {noformat} > {noformat} > 2012-05-24 01:47:28,141 ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node /hbase/unassigned/bd1079bf948c672e493432020dc0e144 already exists and this is not a retry > 2012-05-24 01:47:28,142 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup of failed split of ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.; Failed create of ephemeral /hbase/unassigned/bd1079bf948c672e493432020dc0e144 > java.io.IOException: Failed create of ephemeral /hbase/unassigned/bd1079bf948c672e493432020dc0e144 > at org.apache.hadoop.hbase.regionserver.SplitTransaction.createNodeSplitting(SplitTransaction.java:865) > at org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:239) > at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:450) > at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:67) > {noformat} > Due to the above exception, region splitting was failing contineously more than 5hrs -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira