hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Abhishek Singh Chouhan (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-14889) Region stuck in transition in OPEN state indefinitely in corner scenario
Date Thu, 26 Nov 2015 07:32:11 GMT
Abhishek Singh Chouhan created HBASE-14889:

             Summary: Region stuck in transition in OPEN state indefinitely in corner scenario
                 Key: HBASE-14889
                 URL: https://issues.apache.org/jira/browse/HBASE-14889
             Project: HBase
          Issue Type: Bug
    Affects Versions: 0.98.14
            Reporter: Abhishek Singh Chouhan

During a failure scenario when a RS dies and the bulk assigner(BA) is assigning its regions
to others RSs, if another RS dies(on which some regions are being moved) on which region is
in pending open state, we end up in a situation where two bulk assigners try to assign the
same region on the Same RS.

The following happened - 
1. While one BA was opening the region the second one sees it in pending open state, retries
and calls unassign(...) thereby sending CLOSE RPC to the RS.
2. The RS meanwhile has already opened the region, hence changing the znode state to RS_ZK_REGION_OPENED
which triggers event on master.
3. On master after the unassign is successful we go on to deleting the znode, change region
state to Pending open and send open RPC to RS.
4. The earlier triggered event now sees the state as Pending open and happily changes it to
OPEN, but is unable to delete the znode which by this time is not in RS_ZK_REGION_OPENED state
but is in M_ZK_REGION_OFFLINE state. Hence the region remains in transition in the OPEN state.
5. RS goes on to changing the znode states and successfully opens the region (changes znode
6. This again triggers event on master but this time since the state is OPEN the folloing
code path is taken 
          // Should see OPENED after OPENING but possible after PENDING_OPEN.
          if (regionState == null
              || !regionState.isPendingOpenOrOpeningOnServer(sn)) {
            LOG.warn("Received OPENED for " + prettyPrintedRegionName
              + " from " + sn + " but the region isn't PENDING_OPEN/OPENING here: "
              + regionStates.getRegionState(encodedName));

            if (regionState != null) {
              // Close it without updating the internal region states,
              // so as not to create double assignments in unlucky scenarios
              // mentioned in OpenRegionHandler#process
              unassign(regionState.getRegion(), null, -1, null, false, sn);
We call unassign here with transitionInZK=false and state=null
7. RS closes the region but doesn't update the ZK, also state is not changed in master. Region
remains in transition in OPEN state, when its actually closed. We have to restart the RS post
which it opens correctly on some other RS.

This message was sent by Atlassian JIRA

View raw message