hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "HBase Review Board (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2700) Handle master failover for regions in transition
Date Thu, 07 Oct 2010 23:38:30 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919101#action_12919101
] 

HBase Review Board commented on HBASE-2700:
-------------------------------------------

Message from: "Jonathan Gray" <jgray@apache.org>

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/995/
-----------------------------------------------------------

Review request for hbase and stack.


Summary
-------

First go at a unit test of master failover with regions in transition.

Comment from the test method:

  /**
   * Complex test of master failover that tests as many permutations of the
   * different possible states that regions in transition could be in within ZK.
   * <p>
   * This tests the proper handling of these states by the failed-over master
   * and includes a thorough testing of the timeout code as well.
   * <p>
   * Starts with a single master and three regionservers.
   * <p>
   * Creates two tables, enabledTable and disabledTable, each containing 5
   * regions.  The disabledTable is then disabled.
   * <p>
   * After reaching steady-state, the master is killed.  We then mock several
   * states in ZK.
   * <p>
   * After mocking them, we will startup a new master which should become the
   * active master and also detect that it is a failover.  The primary test
   * passing condition will be that all regions of the enabled table are
   * assigned and all the regions of the disabled table are not assigned.
   * <p>
   * The different scenarios to be tested are below:
   * <p>
   * <b>ZK State:  OFFLINE</b>
   * <p>A node can get into OFFLINE state if</p>
   * <ul>
   * <li>An RS fails to open a region, so it reverts the state back to OFFLINE
   * <li>The Master is assigning the region to a RS before it sends RPC
   * </ul>
   * <p>We will mock the scenarios</p>
   * <ul>
   * <li>Master has assigned an enabled region but RS failed so a region is
   *     not assigned anywhere and is sitting in ZK as OFFLINE</li>
   * <li>This seems to cover both cases?</li>
   * </ul>
   * <p>
   * <b>ZK State:  CLOSING</b>
   * <p>A node can get into CLOSING state if</p>
   * <ul>
   * <li>An RS has begun to close a region
   * </ul>
   * <p>We will mock the scenarios</p>
   * <ul>
   * <li>Region was being closed but the RS died before finishing the close
   * <li>Region of enabled table was being closed but did not complete
   * <li>Region of disabled table was being closed but did not complete
   * </ul>
   * <p>
   * <b>ZK State:  CLOSED</b>
   * <p>A node can get into CLOSED state if</p>
   * <ul>
   * <li>An RS has completed closing a region but not acknowledged by master yet
   * </ul>
   * <p>We will mock the scenarios</p>
   * <ul>
   * <li>Region of a table that should be enabled was closed on an RS
   * <li>Region of a table that should be disabled was closed on an RS
   * </ul>
   * <p>
   * <b>ZK State:  OPENING</b>
   * <p>A node can get into OPENING state if</p>
   * <ul>
   * <li>An RS has begun to open a region
   * </ul>
   * <p>We will mock the scenarios</p>
   * <ul>
   * <li>RS was opening a region of enabled table but never finishes
   * </ul>
   * <p>
   * <b>ZK State:  OPENED</b>
   * <p>A node can get into OPENED state if</p>
   * <ul>
   * <li>An RS has finished opening a region but not acknowledged by master yet
   * </ul>
   * <p>We will mock the scenarios</p>
   * <ul>
   * <li>Region of a table that should be enabled was opened on an RS
   * <li>Region of a table that should be disabled was opened on an RS
   * <li>Region of a table that should be enabled was opened by a now-dead RS
   * <li>Region of a table that should be disabled was opened by a now-dead RS
   * </ul>
   * <p>
   * <b>ZK State:  NONE</b>
   * <p>A region could not have a transition node if</p>
   * <ul>
   * <li>The server hosting the region died and no master processed it
   * </ul>
   * <p>We will mock the scenarios</p>
   * <ul>
   * <li>Region of enabled table was on a dead RS that was not yet processed
   * <li>Region of disabled table was on a dead RS that was not yet processed
   * </ul>
   * @throws Exception
   */


This addresses bug HBASE-2700.
    http://issues.apache.org/jira/browse/HBASE-2700


Diffs
-----

  trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 1005264 
  trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1005264 
  trunk/src/main/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java 1005264 
  trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java 1005264 
  trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 1005264 
  trunk/src/test/java/org/apache/hadoop/hbase/MiniHBaseCluster.java 1005264 
  trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java 1005264 

Diff: http://review.cloudera.org/r/995/diff


Testing
-------

running the unit test!


Thanks,

Jonathan




> Handle master failover for regions in transition
> ------------------------------------------------
>
>                 Key: HBASE-2700
>                 URL: https://issues.apache.org/jira/browse/HBASE-2700
>             Project: HBase
>          Issue Type: Sub-task
>          Components: master, zookeeper
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.90.0
>
>
> To this point in HBASE-2692 tasks we have moved everything for regions in transition
into ZK, but we have not fully handled the master failover case.  This is to deal with that
and to write tests for it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message