hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Gray" <jg...@apache.org>
Subject Re: Review Request: Cleanup of RIT timeouts and server shutdown handling
Date Mon, 01 Nov 2010 18:47:35 GMT

This is an automatically generated e-mail. To reply, visit:

(Updated 2010-11-01 11:47:35.522039)

Review request for hbase and stack.


This fixes new found bug.

When we processed server shutdown, we read META for regions on the dead server.  But we were
only comparing HServerAddress not HServerInfo.  When rolling restart is acting up (or any
time an RS comes back up w/ different startcode) this would cause double assignment.

Checks against HSI now.


Does cleanup of RIT timeouts according to document in progress.  Still finishing document
but I'd like to get this patch tested before finalizing it.

Also found some strange stuff in server shutdown handling that could have easily led to some
double assignment issues that stack was seeing.

This addresses bug HBASE-3181.

Diffs (updated)

  trunk/src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java 1029789 
  trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 1029789 
  trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1029789 
  trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java 1029789 
  trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ClosedRegionHandler.java 1029789

  trunk/src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java 1029789

  trunk/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java 1029789

  trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 1029789

  trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java 1029789 
  trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java 1029789 
  trunk/src/test/java/org/apache/hadoop/hbase/master/TestRollingRestart.java 1029789 

Diff: http://review.cloudera.org/r/1143/diff


Working on tests now.  This definitely changes some behavior that is tested in the new TestMasterFailover
so need to figure if the test should change or whether we need to handle things like CLOSING.
 Maybe let it timeout a few times?



View raw message