hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chinna Rao Lalam (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5806) Handle split region related failures on master restart and RS restart
Date Mon, 30 Apr 2012 14:18:48 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264950#comment-13264950
] 

Chinna Rao Lalam commented on HBASE-5806:
-----------------------------------------


for #1 above, 
RegionServer is crashed at SplitTransaction.createDaughters(Server, RegionServerServices)
in  while removing from online regions()
{code}
    if (!testing) {
      services.removeFromOnlineRegions(this.parent.getRegionInfo().getEncodedName());
    }
{code}

Here where ever the regionserver is crashed the ephemeral node will be deleted and master
will get the notification of nodeDeleted() where it will be cleared from RIT

But the ServerShutdownHandler executed first than the nodeDeleted() event for the region node.
You can see that from the below logs

{noformat}
2012-04-06 14:35:08,841 DEBUG org.apache.hadoop.hbase.master.handler.ServerShutdownHandler:
Removed test,,1333702991530.cdfa837563e75ac5f4dc128680cc8da8. from list of regions to assign
because in RIT; region state: SPLITTING

2012-04-06 14:35:12,981 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Ephemeral
node deleted, regionserver crashed?, clearing from RIT; rs=test,,1333702991530.cdfa837563e75ac5f4dc128680cc8da8.
state=SPLITTING, ts=1333703059260, server=HOST-10-18-40-25,60020,1333695183392
{noformat}

In this situation the below code populated that region

{code}
  List<RegionState> regionsInTransition =
        this.services.getAssignmentManager().
          processServerShutdown(this.serverName);
{code}

and it is in !rit.isClosing() && !rit.isPendingClose() so the region is deleted from
the hris

{code}
      for (RegionState rit : regionsInTransition) {
        if (!rit.isClosing() && !rit.isPendingClose()) {
          LOG.debug("Removed " + rit.getRegion().getRegionNameAsString() +
          " from list of regions to assign because in RIT; region state: " +
          rit.getState());
          if (hris != null) hris.remove(rit.getRegion());
        }
      }
{code}
The fix in SSH addresses #1.
#2 came because of HBASE-5615.  However HBASE-5615 was reverted.
#3 comes when master restarts after sp1itting is done and before CJ has cleared the region
from META. So while rebuilding the user region we ensure that the offlined parent region is
not again taken into account.

#2 and #3 are together taken care in this patch such that the fix does solve both the problems.
                
> Handle split region related failures on master restart and RS restart
> ---------------------------------------------------------------------
>
>                 Key: HBASE-5806
>                 URL: https://issues.apache.org/jira/browse/HBASE-5806
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.1
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: Chinna Rao Lalam
>             Fix For: 0.92.2, 0.96.0, 0.94.1
>
>         Attachments: HBASE-5806.patch
>
>
> This issue is raised to solve issues that comes out of partial region split happened
and the region node in the ZK which is in RS_ZK_REGION_SPLITTING and RS_ZK_REGION_SPLIT is
not yet processed.
> This also tries to address HBASE-5615.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message