hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhihong Yu (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
Date Mon, 26 Dec 2011 10:17:31 GMT

     [ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zhihong Yu updated HBASE-5094:
------------------------------

          Description: 
{code}
RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey());
            ServerName addressFromAM = this.services.getAssignmentManager()
                .getRegionServerOfRegion(e.getKey());
            if (rit != null && !rit.isClosing() && !rit.isPendingClose())
{
              // Skip regions that were in transition unless CLOSING or
              // PENDING_CLOSE
              LOG.info("Skip assigning region " + rit.toString());
            } else if (addressFromAM != null
                && !addressFromAM.equals(this.serverName)) {
              LOG.debug("Skip assigning region "
                    + e.getKey().getRegionNameAsString()
                    + " because it has been opened in "
                    + addressFromAM.getServerName());
              }
{code}
In ServerShutDownHandler we try to get the address in the AM.  This address is initially null
because it is not yet updated after the region was opened .i.e. the CAll back after node deletion
is not yet done in the master side.
But removal from RIT is completed on the master side.  So this will trigger a new assignment.
So there is a small window between the online region is actually added in to the online list
and the ServerShutdownHandler where we check the existing address in AM.

  was:
R1 is reassigned to RS3 during RS1 shutdown, even though R1 was just assigned to RS2 by load
balancer. So .META. table indicated R1 is on RS3. Both RS2 and RS3 think they have R1. Later
when RS3 shutdown, R1 is reassigned to RS2. RS2 will indicate ALREADY_OPENED. Thus the region
is considered assigned to RS2 even though .META. indicates it is on RS3.



1) Region R1 - Assigned from RS1 to RS2.
2) RS1 goes down and ServerShutDownHandler.  ServerShutDwonHandler finds R1 with RS1 from
META as still META is not yet updated to RS2.
3) As RS1 goes down R1 is assigned from RS1 to RS3.  
4) RS3 goes down. ServerShutdownHandler processes R1 and tries to assign it to RS2.
5) RS2 says ALREADY_OPENED but META shows RS3.

I was able to reproduce the scenario in 0.92





    Affects Version/s: 0.92.0
    
> The META can hold an entry for a region with a different server name from the one actually
in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5094
>                 URL: https://issues.apache.org/jira/browse/HBASE-5094
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: ramkrishna.s.vasudevan
>
> {code}
> RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey());
>             ServerName addressFromAM = this.services.getAssignmentManager()
>                 .getRegionServerOfRegion(e.getKey());
>             if (rit != null && !rit.isClosing() && !rit.isPendingClose())
{
>               // Skip regions that were in transition unless CLOSING or
>               // PENDING_CLOSE
>               LOG.info("Skip assigning region " + rit.toString());
>             } else if (addressFromAM != null
>                 && !addressFromAM.equals(this.serverName)) {
>               LOG.debug("Skip assigning region "
>                     + e.getKey().getRegionNameAsString()
>                     + " because it has been opened in "
>                     + addressFromAM.getServerName());
>               }
> {code}
> In ServerShutDownHandler we try to get the address in the AM.  This address is initially
null because it is not yet updated after the region was opened .i.e. the CAll back after node
deletion is not yet done in the master side.
> But removal from RIT is completed on the master side.  So this will trigger a new assignment.
> So there is a small window between the online region is actually added in to the online
list and the ServerShutdownHandler where we check the existing address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message