hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bijieshan <bijies...@huawei.com>
Subject ROOT region appeared in two regionserver's onlineRegions at the same time
Date Fri, 20 May 2011 07:33:42 GMT
This could be happen under the following steps with little probability:
(I suppose the cluster nodes names are RS1/RS2/HM, and there's more than 10,000 regions in
the cluster)

1.Root region was opened in RS1.
2.Due to some reason(Maybe the hdfs process was got abnormal),RS1 aborted.
3.ServerShutdownHandler process start.
4.HMaster was restarted, during the finishInitialization's handling, ROOT region was unsetted,
and assigned to RS2. 
5.Root region was opened successfully in RS2.
6.But after while, ROOT region was unsetted again by RS1's ServerShutdownHandler. Then it
was reassigned. Before that, the RS1 was restarted. So there's two possibilities:
 Case a:
   ROOT region was assigned to RS1. 
   It seemed nothing would be affected. But the root region was still online in RS2.  
 Case b:
   ROOT region was assigned to RS2.    
   The ROOT Region couldn't be opened until it would be reassigned to other regionserver,
because it was showed online in this regionserver.

This could be proved from the logs:

1. ROOT region was opened with two times:
2011-05-17 10:32:59,188 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler:
Opened region -ROOT-,,0.70236052 on 162-2-77-0,20020,1305598359031
2011-05-17 10:33:01,536 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler:
Opened region -ROOT-,,0.70236052 on 162-2-16-6,20020,1305597548212

2.Regionserver 162-2-16-6 was aborted, so it was reassigned to 162-2-77-0, but already online
on this server:
10:49:30,920 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to
open region: -ROOT-,,0.70236052
10:49:30,920 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Processing
open of -ROOT-,,0.70236052
10:49:30,920 WARN org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Attempted
open of -ROOT-,,0.70236052 but already online on this server

This could be cause a long break of ROOT region offline, though it happened under a special
scenario. And I have checked the code, it seems a tiny bug here.


Jieshan Bean 


View raw message