hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amitanand Aiyer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8281) Unassigned regions: dropped messages from Master to RS
Date Fri, 05 Apr 2013 17:43:16 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13623837#comment-13623837

Amitanand Aiyer commented on HBASE-8281:

afaik, the master code was majorly re written in 0.90. Not sure if this affects trunk as well.
> Unassigned regions: dropped messages from Master to RS
> ------------------------------------------------------
>                 Key: HBASE-8281
>                 URL: https://issues.apache.org/jira/browse/HBASE-8281
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.89-fb
>            Reporter: Amitanand Aiyer
> We have seen a couple of scenarios where transcient network issue between the RS and
Master results in regions being unassigned (and staying unassigned) until someone intervenes
manually with hbck -fix.
> The events occur as follows. 
> RS checks in for a regionServerReport.
>   Master wants to assign a region to the RS. Hence adds a MSG_REGION_OPEN msg to the
return results, and marks the region as PENDING_OPEN.
>   The messages from the master to the RS is not delivered due to network error. Master
does not do anything to revert the state changes.
> Network heals, and the RS is able to do regionServerReports in future; it is in good
standing with the master. But, RS does not know that it has to open the region. Master thinks
that the RS is going to open the region.
> Region remains unassigned until we intervene with hbck.
> Possible fix:
>   I think it may be a mistake to unilaterally change the RegionState to pendingOpen once
the master decides that it wants to send the message. Perhaps, we should create an intermediate
state, where the master will keep sending the OPEN message to the RS until it acks. And, update
the RegionState to PendingOpen only after the RS has acked.
> While this would fix the particular scenario in which the unassigned regions were caused.
We might want to update all the Master-RS communication (even region closes?)to expect message
failures, and wait for an ack before it updates the state in master.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message