hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "rajeshbabu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-9968) Cluster is non operative if the RS carrying -ROOT- is expiring after deleting -ROOT- region transition znode and before adding it to online regions.
Date Thu, 14 Nov 2013 06:41:21 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13822217#comment-13822217
] 

rajeshbabu commented on HBASE-9968:
-----------------------------------

[~lhofhansl]
Trunk also should have this problem with meta region . Any way I will check.
To fix this I feel its better read from catalog tracker if we don't find the region location
in AM or transition znode. I will test this and upload the patch.
{code}
        ServerName addressFromCT =
            hri.isRootRegion() ? this.catalogTracker.getRootLocation()
                : hri.isMetaRegion() ? this.catalogTracker.getMetaLocationOrReadLocationFromRoot()
                    : null;      
        boolean matchCT = (addressFromCT != null && addressFromCT.equals(serverName));
        LOG.debug("based on CT, current region=" + hri.getRegionNameAsString() +
          " is on server=" + (addressFromCT != null ? addressFromCT : "null") +
          " server being checked: " + serverName);
{code}

> Cluster is non operative if the RS carrying -ROOT- is expiring after deleting -ROOT-
region transition znode and before adding it to online regions.
> ----------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-9968
>                 URL: https://issues.apache.org/jira/browse/HBASE-9968
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment
>    Affects Versions: 0.94.11
>            Reporter: rajeshbabu
>            Assignee: rajeshbabu
>
> When we check whether the dead region is carrying root or meta, first we will check any
transition znode for the region is there or not. In this case it got deleted. So from zookeeper
we cannot find the region location. 
> {code}
>     try {
>       data = ZKAssign.getData(master.getZooKeeper(), hri.getEncodedName());
>     } catch (KeeperException e) {
>       master.abort("Unexpected ZK exception reading unassigned node for region="
>         + hri.getEncodedName(), e);
>     }
> {code}
> Now we will check from the AssignmentManager whether its in online regions or not
> {code}
>     ServerName addressFromAM = getRegionServerOfRegion(hri);
>     boolean matchAM = (addressFromAM != null &&
>       addressFromAM.equals(serverName));
>     LOG.debug("based on AM, current region=" + hri.getRegionNameAsString() +
>       " is on server=" + (addressFromAM != null ? addressFromAM : "null") +
>       " server being checked: " + serverName);
> {code}
> From AM we will get null because  while adding region to online regions we will check
whether the RS is in onlineservers or not and if not we will not add the region to online
regions.
> {code}
>       if (isServerOnline(sn)) {
>         this.regions.put(regionInfo, sn);
>         addToServers(sn, regionInfo);
>         this.regions.notifyAll();
>       } else {
>         LOG.info("The server is not in online servers, ServerName=" + 
>           sn.getServerName() + ", region=" + regionInfo.getEncodedName());
>       }
> {code}
> Even though the dead regionserver carrying ROOT region, its returning false. After that
ROOT region never assigned.
> Here are the logs
> {code}
> 2013-11-11 18:04:14,730 INFO org.apache.hadoop.hbase.catalog.RootLocationEditor: Unsetting
ROOT region location in ZooKeeper
> 2013-11-11 18:04:14,775 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous
transition plan was found (or we are ignoring an existing plan) for -ROOT-,,0.70236052 so
generated a random one; hri=-ROOT-,,0.70236052, src=, dest=HOST-10-18-40-69,60020,1384173244404;
1 (online=1, available=1) available servers
> 2013-11-11 18:04:14,809 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning
region -ROOT-,,0.70236052 to HOST-10-18-40-69,60020,1384173244404
> 2013-11-11 18:04:18,375 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
Looked up root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@12133926;
serverName=HOST-10-18-40-69,60020,1384173244404
> 2013-11-11 18:04:26,213 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling
transition=RS_ZK_REGION_OPENED, server=HOST-10-18-40-69,60020,1384173244404, region=70236052/-ROOT-
> 2013-11-11 18:04:26,213 INFO org.apache.hadoop.hbase.master.handler.OpenedRegionHandler:
Handling OPENED event for -ROOT-,,0.70236052 from HOST-10-18-40-69,60020,1384173244404; deleting
unassigned node
> 2013-11-11 18:04:31,553 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: based
on AM, current region=-ROOT-,,0.70236052 is on server=null server being checked: HOST-10-18-40-69,60020,1384173244404
> 2013-11-11 18:04:31,561 DEBUG org.apache.hadoop.hbase.master.ServerManager: Added=HOST-10-18-40-69,60020,1384173244404
to dead servers, submitted shutdown handler to be executed, root=false, meta=false
> {code}
> {code}
> 2013-11-11 18:04:32,323 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: The znode
of region -ROOT-,,0.70236052 has been deleted.
> 2013-11-11 18:04:32,323 INFO org.apache.hadoop.hbase.master.AssignmentManager: The server
is not in online servers, ServerName=HOST-10-18-40-69,60020,1384173244404, region=70236052
> 2013-11-11 18:04:32,323 INFO org.apache.hadoop.hbase.master.AssignmentManager: The master
has opened the region -ROOT-,,0.70236052 that was online on HOST-10-18-40-69,60020,1384173244404
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message