hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephen Yuan Jiang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14889) Region stuck in transition in OPEN state indefinitely in corner scenario
Date Fri, 08 Jan 2016 16:38:39 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15089473#comment-15089473

Stephen Yuan Jiang commented on HBASE-14889:

[~pankaj2461], how is your testing on the patch? Could you post the patch here so that we
can make progress?

> Region stuck in transition in OPEN state indefinitely in corner scenario
> ------------------------------------------------------------------------
>                 Key: HBASE-14889
>                 URL: https://issues.apache.org/jira/browse/HBASE-14889
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.98.14, 1.0.2
>            Reporter: Abhishek Singh Chouhan
>            Assignee: Pankaj Kumar
> During a failure scenario when a RS dies and the bulk assigner(BA) is assigning its regions
to others RSs, if another RS dies(on which some regions are being moved) on which region is
in pending open state, we end up in a situation where two bulk assigners try to assign the
same region on the Same RS.
> The following happened - 
> 1. While one BA was opening the region the second one sees it in pending open state,
retries and calls unassign(...) thereby sending CLOSE RPC to the RS.
> 2. The RS meanwhile has already opened the region, hence changing the znode state to
RS_ZK_REGION_OPENED which triggers event on master.
> 3. On master after the unassign is successful we go on to deleting the znode, change
region state to Pending open and send open RPC to RS.
> 4. The earlier triggered event now sees the state as Pending open and happily changes
it to OPEN, but is unable to delete the znode which by this time is not in RS_ZK_REGION_OPENED
state but is in M_ZK_REGION_OFFLINE state. Hence the region remains in transition in the OPEN
> 5. RS goes on to changing the znode states and successfully opens the region (changes
znode state to RS_ZK_REGION_OPENED)
> 6. This again triggers event on master but this time since the state is OPEN the folloing
code path is taken 
> {noformat}
>           // Should see OPENED after OPENING but possible after PENDING_OPEN.
>           if (regionState == null
>               || !regionState.isPendingOpenOrOpeningOnServer(sn)) {
>             LOG.warn("Received OPENED for " + prettyPrintedRegionName
>               + " from " + sn + " but the region isn't PENDING_OPEN/OPENING here: "
>               + regionStates.getRegionState(encodedName));
>             if (regionState != null) {
>               // Close it without updating the internal region states,
>               // so as not to create double assignments in unlucky scenarios
>               // mentioned in OpenRegionHandler#process
>               unassign(regionState.getRegion(), null, -1, null, false, sn);
>             }
>             return;
>           }
> {noformat}
> We call unassign here with transitionInZK=false and state=null
> 7. RS closes the region but doesn't update the ZK, also state is not changed in master.
Region remains in transition in OPEN state, when its actually closed. We have to restart the
RS post which it opens correctly on some other RS.

This message was sent by Atlassian JIRA

View raw message