hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-3937) Region PENDING-OPEN timeout with un-expected ZK node state leads to an endless loop
Date Wed, 01 Jun 2011 22:13:47 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042478#comment-13042478

Jean-Daniel Cryans commented on HBASE-3937:

I'm not sure that patch would be better, starting with the fact that it copies a bunch of
code from the next switch case.

Thinking more about this problem, I believe that in your original case you almost had a double
assignment (and the patch you propose would really make it a double assignment).

Let's say the region times out on PENDING_OPEN but by the time it gets processed it's already
opened by the RS. What you had originally is that it will keep bouncing because RS2 can't
open the region, but now it should be able to assign it since the ZK state is cleared.

It's still unclear to me why your RS1 didn't go through and finally opened it (it should be
in your logs tho), but we have to consider both possibilities.

I'm starting to think that there won't be any easy solution, we need to rewrite how TimeoutMonitor
does its thing. Anything else would just be bandaids that will never fix all the problems.

The way it should work is the following:

 - It should not create a list of unassigns and assigns, since by the time the list is processed
the situation probably changed (I witnessed that a lot).
 - This means the action should be taken as we go through first loop.
 - One of the major issues is the lack of atomicity, so any action taken should first check
the current state, keep the version number, decide of the corrective measure and update the
znode by expecting the version it first got.
 - If the updating of the znode is successful, we know for sure that the operation will be
seen by the region servers.
 - If it's not successful, the situation needs to be reassessed.

This is clearly not something for 0.90, that's one of the reasons in 0.90.3 we set the timeout
much higher than 30 seconds. That's my conclusion at the end of HBASE-3669.

> Region PENDING-OPEN timeout with un-expected ZK node state leads to an endless loop
> -----------------------------------------------------------------------------------
>                 Key: HBASE-3937
>                 URL: https://issues.apache.org/jira/browse/HBASE-3937
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.3
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.4
> I describe the scenario of how this problem happened:
> 1.HMaster assigned the region A to RS1. So the RegionState was set to PENDING_OPEN.
> 2.For there's too many opening requests, the open process on RS1 was blocked.
> 3.Some time later, TimeoutMonitor found the assigning of A was timeout. For the RegionState
was in PENDING_OPEN, went into the following handler process(Just put the region into an waiting-assigning
>    case PENDING_OPEN:
>       LOG.info("Region has been PENDING_OPEN for too " +
>           "long, reassigning region=" +
>           regionInfo.getRegionNameAsString());
>       assigns.put(regionState.getRegion(), Boolean.TRUE);
>       break;
> So we can see that, under this case, we consider the ZK node state was OFFLINE. Indeed,
in an normal disposal, it's OK.
> 4.But before the real-assigning, the requests of RS1 was disposed. So that affected the
new-assigning. For it update the ZK node state from OFFLINE to OPENING. 
> 5.The new assigning started, so it send region to open in RS2. But while the opening,
it should update the ZK node state from OFFLINE to OPENING. For the current state is OPENING,
so this operation failed.
> So this region couldn't be open success anymore.
> So I think, to void this problem , under the case of PENDING_OPEN of TiemoutMonitor,
we should transform the ZK node state to OFFLINE first.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message