hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jiraposter@reviews.apache.org (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4265) zookeeper.KeeperException$NodeExistsException if HMaster restarts while table is being disabled
Date Wed, 31 Aug 2011 01:39:11 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094243#comment-13094243
] 

jiraposter@reviews.apache.org commented on HBASE-4265:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1685/
-----------------------------------------------------------

Review request for hbase, Ted Yu and ramkrishna vasudevan.


Summary
-------

The issue is disableTable tries to work on those regions in transition. disableTable already
has code to bypass those regions in transition. The issue is recoverTableInDisablingState
is called before processRegionsInTransition(which updates regions-in-transition list) is called
at startup. Thus the regions-in-transition list hasn't been updated when recoverTableInDisablingState
is called.

The fix is to postpone recoverTableInDisablingState, after processRegionsInTransition is called.


This addresses bug hbase-4265.
    https://issues.apache.org/jira/browse/hbase-4265


Diffs
-----

  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
1163346 

Diff: https://reviews.apache.org/r/1685/diff


Testing
-------

On a small cluster, stop HMaster when disableTable is in progress. Make sure there are some
regions-in-transition in zk when the HMaster shudown occurs. Without the fix, we get such
exception. With the fix, HMaster can continue disabling process after restart and table can
get to disabled state.


Thanks,

Ming



> zookeeper.KeeperException$NodeExistsException if HMaster restarts while table is being
disabled
> -----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-4265
>                 URL: https://issues.apache.org/jira/browse/HBASE-4265
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>             Fix For: 0.92.0
>
>
> There seems to be more than just one issue regarding the following scenario. I would
provide a fix later just for this exception.
> 1. A table is being disabled.
> 2. HMaster restarted.
> 3. At HMaster startup, it tries to transition from disabling to disabled state. It got
the following exception.
> org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists
for /hbase/unassigned/419b902243c836c285108ba555b712fa
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:110)
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
> 	at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
> 	at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:475)
> 	at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:457)
> 	at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndWatch(ZKUtil.java:742)
> 	at org.apache.hadoop.hbase.zookeeper.ZKAssign.createNodeClosing(ZKAssign.java:461)
> 	at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1440)
> 	at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1406)
> 	at org.apache.hadoop.hbase.master.handler.DisableTableHandler$BulkDisabler$1.run(DisableTableHandler.java:141)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> This issue is this specific region is in a special state before HMaster restarts; it
has been closed by RS properly thus the zk state is RS_ZK_REGION_CLOSED. However, HMaster
hasn't got a chance to process ClosedRegionHandler yet and thus the node remains at zk. After
RS restarts, this node is added to online region list first in AssignmentManager.rebuildUserRegions
and tries to unassign it later.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message