hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ramkrishna.s.vasudevan (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
Date Fri, 30 Dec 2011 01:23:30 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177496#comment-13177496
] 

ramkrishna.s.vasudevan commented on HBASE-5094:
-----------------------------------------------

Steps to reproduece the problem
->1) Load balancer started moving region(R1) from RS1 to Rs2.
->2)Rs2 has not yet updated in META table, before that RS1 goes down.
->3) So Servershutdownhandler started,
        a) he first removes the region R1 from online list in master
       b)  and he sees R1 with RS1 as per META entry.
->4) That point RS2 completes the opening and updates the META.
-> 5)Call back comes to master, removes the region from RIT and not yet added to onlineRegionlist
in MAster.
->6)The step 3 continues and he sees addressinAM is null and also RIT is null and so he
goes with assignment.
-> 7) Now R1 is updated  as RS3 in META and the operation gets completed.  So master also
stores in online list that R1 is with RS3.
->8) Now RS3 goes down .
-> 9) Region R1 is getting assigned to RS2 from RS3 and RS2 says ALREADY_OPENED.

                
> The META can hold an entry for a region with a different server name from the one actually
in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5094
>                 URL: https://issues.apache.org/jira/browse/HBASE-5094
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: ramkrishna.s.vasudevan
>            Priority: Critical
>         Attachments: HBASE-5094_1.patch
>
>
> {code}
> RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey());
>             ServerName addressFromAM = this.services.getAssignmentManager()
>                 .getRegionServerOfRegion(e.getKey());
>             if (rit != null && !rit.isClosing() && !rit.isPendingClose())
{
>               // Skip regions that were in transition unless CLOSING or
>               // PENDING_CLOSE
>               LOG.info("Skip assigning region " + rit.toString());
>             } else if (addressFromAM != null
>                 && !addressFromAM.equals(this.serverName)) {
>               LOG.debug("Skip assigning region "
>                     + e.getKey().getRegionNameAsString()
>                     + " because it has been opened in "
>                     + addressFromAM.getServerName());
>               }
> {code}
> In ServerShutDownHandler we try to get the address in the AM.  This address is initially
null because it is not yet updated after the region was opened .i.e. the CAll back after node
deletion is not yet done in the master side.
> But removal from RIT is completed on the master side.  So this will trigger a new assignment.
> So there is a small window between the online region is actually added in to the online
list and the ServerShutdownHandler where we check the existing address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message