hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: ROOT region appeared in two regionserver's onlineRegions at the same time
Date Mon, 23 May 2011 16:47:29 GMT
That sounds reasonable Jieshan.   Would you mind filing an issue
referring to this mail thread?  If you have a patch, that'd be
excellent.
St.Ack

2011/5/23 bijieshan <bijieshan@huawei.com>:
> There's 2 references about assignRoot():
>
> 1.
> HMaster# assignRootAndMeta:
>
>    if (!catalogTracker.verifyRootRegionLocation(timeout)) {
>      this.assignmentManager.assignRoot();
>      this.catalogTracker.waitForRoot();
>      assigned++;
>    }
>
> 2.
> ServerShutdownHandler# process:
>
>      if (isCarryingRoot()) { // -ROOT-
>        try {
>           this.services.getAssignmentManager().assignRoot();
>        } catch (KeeperException e) {
>           this.server.abort("In server shutdown processing, assigning root", e);
>           throw new IOException("Aborting", e);
>        }
>      }
>
> I think each time call the method of assignRoot(), we should verify Root Region's Location
first. Because before the assigning, the ROOT region could have been assigned by another place.
> Expecting for anyone's reply.
>
> Thanks!
>
> Regards,
> Jieshan Bean
>
>
> -----邮件原件-----
> 发件人: bijieshan [mailto:bijieshan@huawei.com]
> 发送时间: 2011年5月20日 15:34
> 收件人: user@hbase.apache.org
> 抄送: Chenjian
> 主题: ROOT region appeared in two regionserver's onlineRegions at the same time
>
> This could be happen under the following steps with little probability:
> (I suppose the cluster nodes names are RS1/RS2/HM, and there's more than 10,000 regions
in the cluster)
>
> 1.Root region was opened in RS1.
> 2.Due to some reason(Maybe the hdfs process was got abnormal),RS1 aborted.
> 3.ServerShutdownHandler process start.
> 4.HMaster was restarted, during the finishInitialization's handling, ROOT region was
unsetted, and assigned to RS2.
> 5.Root region was opened successfully in RS2.
> 6.But after while, ROOT region was unsetted again by RS1's ServerShutdownHandler. Then
it was reassigned. Before that, the RS1 was restarted. So there's two possibilities:
>  Case a:
>   ROOT region was assigned to RS1.
>   It seemed nothing would be affected. But the root region was still online in RS2.
>
>  Case b:
>   ROOT region was assigned to RS2.
>   The ROOT Region couldn't be opened until it would be reassigned to other regionserver,
because it was showed online in this regionserver.
>
> This could be proved from the logs:
>
> 1. ROOT region was opened with two times:
> 2011-05-17 10:32:59,188 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler:
Opened region -ROOT-,,0.70236052 on 162-2-77-0,20020,1305598359031
> 2011-05-17 10:33:01,536 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler:
Opened region -ROOT-,,0.70236052 on 162-2-16-6,20020,1305597548212
>
> 2.Regionserver 162-2-16-6 was aborted, so it was reassigned to 162-2-77-0, but already
online on this server:
> 10:49:30,920 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request
to open region: -ROOT-,,0.70236052
> 10:49:30,920 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Processing
open of -ROOT-,,0.70236052
> 10:49:30,920 WARN org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Attempted
open of -ROOT-,,0.70236052 but already online on this server
>
> This could be cause a long break of ROOT region offline, though it happened under a special
scenario. And I have checked the code, it seems a tiny bug here.
>
> Thanks!
>
> Regards,
> Jieshan Bean
>
>
>
>
>
>

Mime
View raw message