hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8545) Meta stuck in transition when it is assigned to a just restarted dead region sever
Date Wed, 22 May 2013 06:11:21 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13663823#comment-13663823
] 

Hudson commented on HBASE-8545:
-------------------------------

Integrated in hbase-0.95-on-hadoop2 #107 (See [https://builds.apache.org/job/hbase-0.95-on-hadoop2/107/])
    HBASE-8545 Meta stuck in transition when it is assigned to a just restarted dead region
sever (Revision 1484876)

     Result = FAILURE
jxiang : 
Files : 
* /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java
* /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
* /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManagerOnCluster.java

                
> Meta stuck in transition when it is assigned to a just restarted dead region sever 
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-8545
>                 URL: https://issues.apache.org/jira/browse/HBASE-8545
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment
>            Reporter: Jimmy Xiang
>            Assignee: Jimmy Xiang
>             Fix For: 0.98.0, 0.95.1
>
>         Attachments: trunk-8545.patch, trunk-8545_v2.patch, trunk-8545_v3.patch
>
>
> Support the meta region server is down, and the SSH tries to re-assign it.  This could
happen:
> 1. AM plans to assign meta to a region server (R_old);
> 2. Now R_old is dead, the new region server (R_new) starts up on the same host, port,
but gets a different start code;
> 3. AM sends the open region request to R_new and the Meta is opened on it;
> 4. AM gets ZK event, but it is from a different region server instance (R_new), not the
expected one (R_old), so it sends a close region request to R_new;
> 5. Now, the meta is stuck in transition and won't be assigned.
> This won't happen to a user region since the SSH for R_old will find out the user region
stuck in transition and re-assign it.  For meta, it is a little different.  AM checks if a
dead region server carries the meta based on the ZK info, which is changed to the new region
server R_new at step 3 by the open region handler.
> The fix I was thinking about is:
> 1. In checking if a region server carries a region, uses the region transition information
if it exists (which is the source of truth, to master), if not, checks the ZK data as before;
> 2. In open region handler, when transition assign zk node from offline to opening, make
sure the current region server is the expected one (ZK#transitionNode, existing code doesn't
check the target server name).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message