hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4796) Race between SplitRegionHandlers for the same region kills the master
Date Wed, 16 Nov 2011 23:57:52 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151652#comment-13151652
] 

Hadoop QA commented on HBASE-4796:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12503988/4796-v2.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/273//console

This message is automatically generated.
                
> Race between SplitRegionHandlers for the same region kills the master
> ---------------------------------------------------------------------
>
>                 Key: HBASE-4796
>                 URL: https://issues.apache.org/jira/browse/HBASE-4796
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: ramkrishna.s.vasudevan
>             Fix For: 0.92.0, 0.94.0
>
>         Attachments: 4796-v2.txt, 4796.txt
>
>
> I just saw that multiple SplitRegionHandlers can be created for the same region because
of the RS tickling, but it becomes deadly when more than 1 are trying to delete the znode
at the same time:
> {quote}
> 2011-11-16 02:25:28,778 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling
transition=RS_ZK_REGION_SPLIT, server=sv4r7s38,62023,1321410237387, region=f80b6a904048a99ce88d61420b8906d1
> 2011-11-16 02:25:28,780 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling
transition=RS_ZK_REGION_SPLIT, server=sv4r7s38,62023,1321410237387, region=f80b6a904048a99ce88d61420b8906d1
> 2011-11-16 02:25:28,796 DEBUG org.apache.hadoop.hbase.master.handler.SplitRegionHandler:
Handling SPLIT event for f80b6a904048a99ce88d61420b8906d1; deleting node
> 2011-11-16 02:25:28,798 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x132f043bbde094b
Deleting existing unassigned node for f80b6a904048a99ce88d61420b8906d1 that is in expected
state RS_ZK_REGION_SPLIT
> 2011-11-16 02:25:28,804 DEBUG org.apache.hadoop.hbase.master.handler.SplitRegionHandler:
Handling SPLIT event for f80b6a904048a99ce88d61420b8906d1; deleting node
> 2011-11-16 02:25:28,806 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x132f043bbde094b
Deleting existing unassigned node for f80b6a904048a99ce88d61420b8906d1 that is in expected
state RS_ZK_REGION_SPLIT
> 2011-11-16 02:25:28,821 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x132f043bbde094b
Successfully deleted unassigned node for region f80b6a904048a99ce88d61420b8906d1 in expected
state RS_ZK_REGION_SPLIT
> 2011-11-16 02:25:28,821 INFO org.apache.hadoop.hbase.master.handler.SplitRegionHandler:
Handled SPLIT report); parent=TestTable,0000006304,1321409743253.f80b6a904048a99ce88d61420b8906d1.
daughter a=TestTable,0000006304,1321410325564.e0f5d201683bcabe14426817224334b8.daughter b=TestTable,0000007054,1321410325564.1b82eeb5d230c47ccc51c08256134839.
> 2011-11-16 02:25:28,829 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper:
Node /hbase/unassigned/f80b6a904048a99ce88d61420b8906d1 already deleted, and this is not a
retry
> 2011-11-16 02:25:28,830 FATAL org.apache.hadoop.hbase.master.HMaster: Error deleting
SPLIT node in ZK for transition ZK node (f80b6a904048a99ce88d61420b8906d1)
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/unassigned/f80b6a904048a99ce88d61420b8906d1
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
> 	at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:728)
> 	at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:107)
> 	at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:884)
> 	at org.apache.hadoop.hbase.zookeeper.ZKAssign.deleteNode(ZKAssign.java:506)
> 	at org.apache.hadoop.hbase.zookeeper.ZKAssign.deleteNode(ZKAssign.java:453)
> 	at org.apache.hadoop.hbase.master.handler.SplitRegionHandler.process(SplitRegionHandler.java:95)
> 	at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:168)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> 	at java.lang.Thread.run(Thread.java:662)
> {quote}
> Stack and I came up with the solution that we need just manage that exception because
handleSplitReport is an in-memory thing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message