hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jieshan Bean (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-3937) Region PENDING-OPEN timeout with un-expected ZK node state leads to an endless loop
Date Wed, 01 Jun 2011 00:28:47 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041909#comment-13041909

Jieshan Bean commented on HBASE-3937:

I have thought about that. But I'm afraid another problem. For it force all the ZK nodes related
to the regions in RIT to Offline each time. If the original state is Offline, it will reset
again. I don't know whether it is a problem.

> Region PENDING-OPEN timeout with un-expected ZK node state leads to an endless loop
> -----------------------------------------------------------------------------------
>                 Key: HBASE-3937
>                 URL: https://issues.apache.org/jira/browse/HBASE-3937
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.3
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.4
> I describe the scenario of how this problem happened:
> 1.HMaster assigned the region A to RS1. So the RegionState was set to PENDING_OPEN.
> 2.For there's too many opening requests, the open process on RS1 was blocked.
> 3.Some time later, TimeoutMonitor found the assigning of A was timeout. For the RegionState
was in PENDING_OPEN, went into the following handler process(Just put the region into an waiting-assigning
>    case PENDING_OPEN:
>       LOG.info("Region has been PENDING_OPEN for too " +
>           "long, reassigning region=" +
>           regionInfo.getRegionNameAsString());
>       assigns.put(regionState.getRegion(), Boolean.TRUE);
>       break;
> So we can see that, under this case, we consider the ZK node state was OFFLINE. Indeed,
in an normal disposal, it's OK.
> 4.But before the real-assigning, the requests of RS1 was disposed. So that affected the
new-assigning. For it update the ZK node state from OFFLINE to OPENING. 
> 5.The new assigning started, so it send region to open in RS2. But while the opening,
it should update the ZK node state from OFFLINE to OPENING. For the current state is OPENING,
so this operation failed.
> So this region couldn't be open success anymore.
> So I think, to void this problem , under the case of PENDING_OPEN of TiemoutMonitor,
we should transform the ZK node state to OFFLINE first.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message