hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ming Ma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4446) Rolling restart RSs scenario, regions could stay in OPENING state
Date Tue, 20 Sep 2011 06:08:09 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108385#comment-13108385
] 

Ming Ma commented on HBASE-4446:
--------------------------------

Good point, Todd. Thanks, Ted. Here is why the master didn't handle this. Note, part of the
log below comes from the new code. The issue is by the time assignmentmanager gets the notification,
the RS isn't online anymore. Thus the processing based on ZK callback is skipped.

2011-09-19 22:04:54,506 WARN org.apache.hadoop.hbase.master.AssignmentManager: Attempted to
handle region transition for server but server is not online: miweng_test,1??s$? >,1316493502701.6409ae717931daee3705f3e7d33d85b5.


2011-09-19 22:22:06,561 WARN org.apache.hadoop.hbase.master.AssignmentManager: While timing
out a region in state OPENING, found ZK node in unexpected state: RS_ZK_REGION_FAILED_OPEN
region= miweng_test,1\xC8\xFAs$\xB7 >,1316493502701.6409a
e717931daee3705f3e7d33d85b5.


That also means we can fix the issue in a different way. Why does AssignmentManager.handleRegion
have to inforce the following condition and rely on TimeoutMonitor and ServerShutdownHandler
to kick in? At least for certain states like RS_ZK_REGION_FAILED_OPEN, RS_ZK_REGION_CLOSED,
AssignmentManager.handleRegion can still process the event even though the RS is down.

      // Verify this is a known server
      if (!serverManager.isServerOnline(sn) &&
          !this.master.getServerName().equals(sn)) {
        LOG.warn("Attempted to handle region transition for server but " +
          "server is not online: " + Bytes.toString(data.getRegionName()));
        return;
      }



> Rolling restart RSs scenario, regions could stay in OPENING state
> -----------------------------------------------------------------
>
>                 Key: HBASE-4446
>                 URL: https://issues.apache.org/jira/browse/HBASE-4446
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>             Fix For: 0.92.0
>
>         Attachments: HBASE-4446-trunk.patch
>
>
> Keep Master up all the time, do rolling restart of RSs like this - stop RS1, wait for
2 seconds, stop RS2, start RS1, wait for 2 seconds, stop RS3, start RS2, wait for 2 seconds,
etc. Region sometimes can just stay in OPENING state even after timeoutmonitor period.
> 2011-09-19 08:10:33,131 WARN org.apache.hadoop.hbase.master.AssignmentManager: While
timing out a region in state OPENING, found ZK node in unexpected state: RS_ZK_REGION_FAILED_OPEN
> The issue - RS was shutdown when a region is being opened, it was transitioned to RS_ZK_REGION_FAILED_OPEN
in ZK. In timeoutmonitor, it didn't take care of RS_ZK_REGION_FAILED_OPEN.
> processOpeningState
> ...
>    else if (dataInZNode.getEventType() != EventType.RS_ZK_REGION_OPENING &&
>         LOG.warn("While timing out a region in state OPENING, "
>             + "found ZK node in unexpected state: "
>             + dataInZNode.getEventType());
>         return;
>       }

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message