hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Gray" <jg...@apache.org>
Subject Re: Review Request: HBASE-2700 Unit test of master failover while regions in transition
Date Mon, 18 Oct 2010 18:01:23 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/995/
-----------------------------------------------------------

(Updated 2010-10-18 11:01:23.149541)


Review request for hbase and stack.


Changes
-------

Finishes the third unit test which has RIT in addition to a failure of an RS that happens
concurrently with no master being around.

All three unit tests are passing for me in eclipse and on cmd line (simple failover, RIT failover,
RIT+RS failover)

I think this should be enough to resolve 2700 and maybe some others.


Summary
-------

First go at a unit test of master failover with regions in transition.

Comment from the test method:

  /**
   * Complex test of master failover that tests as many permutations of the
   * different possible states that regions in transition could be in within ZK.
   * <p>
   * This tests the proper handling of these states by the failed-over master
   * and includes a thorough testing of the timeout code as well.
   * <p>
   * Starts with a single master and three regionservers.
   * <p>
   * Creates two tables, enabledTable and disabledTable, each containing 5
   * regions.  The disabledTable is then disabled.
   * <p>
   * After reaching steady-state, the master is killed.  We then mock several
   * states in ZK.
   * <p>
   * After mocking them, we will startup a new master which should become the
   * active master and also detect that it is a failover.  The primary test
   * passing condition will be that all regions of the enabled table are
   * assigned and all the regions of the disabled table are not assigned.
   * <p>
   * The different scenarios to be tested are below:
   * <p>
   * <b>ZK State:  OFFLINE</b>
   * <p>A node can get into OFFLINE state if</p>
   * <ul>
   * <li>An RS fails to open a region, so it reverts the state back to OFFLINE
   * <li>The Master is assigning the region to a RS before it sends RPC
   * </ul>
   * <p>We will mock the scenarios</p>
   * <ul>
   * <li>Master has assigned an enabled region but RS failed so a region is
   *     not assigned anywhere and is sitting in ZK as OFFLINE</li>
   * <li>This seems to cover both cases?</li>
   * </ul>
   * <p>
   * <b>ZK State:  CLOSING</b>
   * <p>A node can get into CLOSING state if</p>
   * <ul>
   * <li>An RS has begun to close a region
   * </ul>
   * <p>We will mock the scenarios</p>
   * <ul>
   * <li>Region was being closed but the RS died before finishing the close
   * <li>Region of enabled table was being closed but did not complete
   * <li>Region of disabled table was being closed but did not complete
   * </ul>
   * <p>
   * <b>ZK State:  CLOSED</b>
   * <p>A node can get into CLOSED state if</p>
   * <ul>
   * <li>An RS has completed closing a region but not acknowledged by master yet
   * </ul>
   * <p>We will mock the scenarios</p>
   * <ul>
   * <li>Region of a table that should be enabled was closed on an RS
   * <li>Region of a table that should be disabled was closed on an RS
   * </ul>
   * <p>
   * <b>ZK State:  OPENING</b>
   * <p>A node can get into OPENING state if</p>
   * <ul>
   * <li>An RS has begun to open a region
   * </ul>
   * <p>We will mock the scenarios</p>
   * <ul>
   * <li>RS was opening a region of enabled table but never finishes
   * </ul>
   * <p>
   * <b>ZK State:  OPENED</b>
   * <p>A node can get into OPENED state if</p>
   * <ul>
   * <li>An RS has finished opening a region but not acknowledged by master yet
   * </ul>
   * <p>We will mock the scenarios</p>
   * <ul>
   * <li>Region of a table that should be enabled was opened on an RS
   * <li>Region of a table that should be disabled was opened on an RS
   * <li>Region of a table that should be enabled was opened by a now-dead RS
   * <li>Region of a table that should be disabled was opened by a now-dead RS
   * </ul>
   * <p>
   * <b>ZK State:  NONE</b>
   * <p>A region could not have a transition node if</p>
   * <ul>
   * <li>The server hosting the region died and no master processed it
   * </ul>
   * <p>We will mock the scenarios</p>
   * <ul>
   * <li>Region of enabled table was on a dead RS that was not yet processed
   * <li>Region of disabled table was on a dead RS that was not yet processed
   * </ul>
   * @throws Exception
   */


This addresses bug HBASE-2700.
    http://issues.apache.org/jira/browse/HBASE-2700


Diffs (updated)
-----

  trunk/src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java 1023927 
  trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 1023927 
  trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1023927 
  trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java 1023927 
  trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ClosedRegionHandler.java 1023927

  trunk/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java 1023927

  trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 1023927

  trunk/src/main/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java 1023927 
  trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java 1023927 
  trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 1023927 
  trunk/src/test/java/org/apache/hadoop/hbase/MiniHBaseCluster.java 1023927 
  trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java 1023927 

Diff: http://review.cloudera.org/r/995/diff


Testing
-------

running the unit test!


Thanks,

Jonathan


Mime
View raw message