hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jiraposter@reviews.apache.org (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4455) Rolling restart RSs scenario, -ROOT-, .META. regions are lost in AssignmentManager
Date Thu, 22 Sep 2011 05:51:28 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112345#comment-13112345
] 

jiraposter@reviews.apache.org commented on HBASE-4455:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2007/#review2018
-----------------------------------------------------------


Nice work.
Debug log should be reduced.


http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2007/#comment4548>

    Indentation.
    This should be LOG.debug()



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2007/#comment4549>

    A few spaces before timeout would be nice.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2007/#comment4550>

    Should be 'carrying .META.'



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
<https://reviews.apache.org/r/2007/#comment4551>

    There should be a verb in this sentence.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
<https://reviews.apache.org/r/2007/#comment4552>

    This line can be wrapped to the line above.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
<https://reviews.apache.org/r/2007/#comment4553>

    If variable is named matchZK, it would be symmetrical with matchAM.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
<https://reviews.apache.org/r/2007/#comment4554>

    addressFromAM should be used here.
    



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
<https://reviews.apache.org/r/2007/#comment4555>

    Personally I prefer enum over boolean.
    enum is more readable and expandable.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
<https://reviews.apache.org/r/2007/#comment4556>

    Server is repeated. And a few places below.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
<https://reviews.apache.org/r/2007/#comment4557>

    the is repeated.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
<https://reviews.apache.org/r/2007/#comment4558>

    Good.


- Ted


On 2011-09-22 00:38:16, Ming Ma wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2007/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-09-22 00:38:16)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  1. Add more logging.
bq.  2. Clean up CatalogTracker. waitForMeta waits for "timeout" value. When waitForMetaServerConnectionDefault
is called by MetaNodeTracker, the timeout value is large. So it doesn't retry in case .ROOT.
is updated; add the proper implementation for CatalogTracker.verifyMetaRegionLocation
bq.  4. Check for the latest -ROOT- and .META. region location during the handling of server
shutdown.
bq.  5. Right after assigning the -ROOT- or .META. in ServerShutdownHandler, don't block and
wait for .META. availability. Resubmit another ServerShutdownHandler for regular regions.
bq.  
bq.  
bq.  This addresses bug HBASE-4455.
bq.      https://issues.apache.org/jira/browse/HBASE-4455
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
1172205 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRegionHandler.java
1172205 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
1172205 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
1172205 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java
1172205 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java
1172205 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
1172205 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/MetaServerShutdownHandler.java
1172205 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
1172205 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
1172205 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
1172205 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperNodeTracker.java
1172205 
bq.  
bq.  Diff: https://reviews.apache.org/r/2007/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Keep Master up all the time, do rolling restart of RSs like this - stop RS1, wait for
2 seconds, stop RS2, start RS1, wait for 2 seconds, stop RS3, start RS2, wait for 2 seconds,
etc. The program can run for couple hours until it stops. -ROOT- and .META. are available
during that time.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ming
bq.  
bq.



> Rolling restart RSs scenario, -ROOT-, .META. regions are lost in AssignmentManager
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-4455
>                 URL: https://issues.apache.org/jira/browse/HBASE-4455
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>
> Keep Master up all the time, do rolling restart of RSs like this - stop RS1, wait for
2 seconds, stop RS2, start RS1, wait for 2 seconds, stop RS3, start RS2, wait for 2 seconds,
etc. After a while, you will find the -ROOT-, .META. regions aren't in "regions in transtion"
from AssignmentManager point of view, but they aren't assigned to any regions. Here are the
issues.
> 1. .-ROOT- or .META. location is stale when MetaServerShutdownHandler is invoked to check
if it contains -ROOT- region. That is due to long delay from ZK notification and async nature
of the system. Here is an example, even though new root region server sea-lab-1,60020,1316380133656
is set at T2, at T3 the shutdown process for sea-lab-1,60020,1316380133656, the root location
still points to old server sea-lab-3,60020,1316380037898.
> T1: 2011-09-18 14:08:52,470 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:6
> 0000-0x1327e43175e0000 Retrieved 29 byte(s) of data from znode /hbase/root-regio
> n-server and set watcher; sea-lab-3,60020,1316380037898
> T2: 2011-09-18 14:08:57,173 INFO org.apache.hadoop.hbase.catalog.RootLocationEditor:
Setting ROOT region location in ZooKeeper as sea-lab-1,60020,1316380133656
> T3: 2011-09-18 14:10:26,393 DEBUG org.apache.hadoop.hbase.master.ServerManager: Adde
> d=sea-lab-1,60020,1316380133656 to dead servers, submitted shutdown handler to be executed,
root=false, meta=true, current Root Location: sea-lab-3,60020,1316380037898
> T4: 2011-09-18 14:12:37,314 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:6
> 0000-0x1327e43175e0000 Retrieved 29 byte(s) of data from znode /hbase/root-region-server
and set watcher; sea-lab-1,60020,1316380133656
> 2. The MetaServerShutdownHandler worker thread that waits for -ROOT- or .META. availability
could be blocked. If meanwhile, the new server that -ROOT- or .META. is being assigned restarted,
another instance of MetaServerShutdownHandler is queued. Eventually, all MetaServerShutdownHandler
worker threads are filled up. It looks like HBASE-4245.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message