hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From st...@duboce.net
Subject Re: Review Request: HBASE-3047: If new master crashes, restart is messy
Date Wed, 29 Sep 2010 06:31:23 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/915/
-----------------------------------------------------------

(Updated 2010-09-28 23:31:22.975377)


Review request for hbase, stack and Jonathan Gray.


Changes
-------

Here, this should be more robust.  Your comments should be addressed also.  For sure, AM#processFailover
has holes -- e.g. what if a regionserver crashed while new master was coming up -- but lets
address that in another issue.  Below are notes on changes made since v1 of the patch.

M src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperNodeTracker.java
  Change here was because saw a case where we hung for ever (my guess is that remaining became
equal to NO_TIMEOUT).  Redid the logic here.
M src/main/java/org/apache/hadoop/hbase/regionserver/Leases.java
  Set this thread to be daemon.  Have seen it hold up RS shutdowns.
M src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java
  Renamed the initialize method as createInitialFileSystemLayout, made it private it and called
it from constructor.  Its idempotent, cheap, and no need others should be concerned with these
mechanics; encapsulate it.
M src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
  Removed freshClusterStartup flag.  Now, let any 'unknown' server in and register it UNLESS
its a dead server (fixed up expiration so we add to dead servers BEFORE we remove from online
servers).  Have waitForRegionServers return count of regions out on cluster.  This will be
0 if servers are coming in with clean regionServerStartup but if they came in and were registered
on a regionServerReport, then they'll have a filled out HServerLoad with a count of regions.
 Use count of regions as way to tell if regions out on cluster or not.
M src/main/java/org/apache/hadoop/hbase/master/HMaster.java
  Removed freshClusterStartup.  Added logging of state of cluster-up flag, and # of regionservers
out on cluster.  Use count of regions out on cluster to figure if we are to do assign of all
user regions or if instead we are to do process failover.  Added splitting of WALs always
and check and reassign of root and meta whether fresh start up or failover.
M src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
  Added notes on holes in processFailover.
M src/main/resources/hbase-default.xml
  Set checkin down from 5 to 3 seconds again.


Summary
-------

This is patch from Stack, just putting up on rb.

M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
  Add test of case where HRegionInterface connection throws a
  ConnectionException. Also tests two new verify root and meta 
  locations added to CatalogTracker.
M src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
  Change order in which we start up trackers in ZK.  Also add blocking
  until master is up to make it less likely we'll start before master
  comes up, especially around the cluster start up situation.
M src/main/java/org/apache/hadoop/hbase/master/HMaster.java
  Introduce new state on startup, the case where the cluster is
  NOT a fresh startup and its NOT a cluster where all is fully
  assigned.  The repair the master needs run to fixup this new
  state is not yet done; we throw a NotImplementedException for
  now.  TODO.  Added new isRunningCluster checker used figuring
  what the cluster condition is when master is joining.  Not
  comprehensive but good enough for now.
M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
  Javadoc.
  Added new verifyRootRegionLocation and verifyMetaRegionLocation.
  Needed to verify whats in zk is actually locations of catalog
  regions.
M src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java
  Add fact that the verifying method, getRegionInfo, can throw
  ConnectException


This addresses bug HBASE-3047.
    http://issues.apache.org/jira/browse/HBASE-3047


Diffs (updated)
-----

  trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java 1001981 
  trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 1001981 
  trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 1001981 
  trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1001981 
  trunk/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java 1001981 
  trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java 1001981 
  trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 1001981 
  trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Leases.java 1001981 
  trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperNodeTracker.java 1001981

  trunk/src/main/resources/hbase-default.xml 1001981 
  trunk/src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 1001981 

Diff: http://review.cloudera.org/r/915/diff


Testing
-------


Thanks,

Jonathan


Mime
View raw message