hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Gray (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2700) Handle master failover for regions in transition
Date Thu, 24 Jun 2010 14:45:50 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882169#action_12882169
] 

Jonathan Gray commented on HBASE-2700:
--------------------------------------

During master failover, the most important thing is to determine whether there are regions
which need actions to be taken by the master to get back to normal operation (for example,
they were being moved for load balancing but master only issued the closeRegion not the openRegion
yet).  Anything the master would need to cleanup immediately would be regions in transitions.
 That information would be available immediately via ZK with this new design.  That seems
superior to requiring each RS to tell you everything it holds and diff'ing that against a
full META scan which could take some time and is more complex.

The other thing we'd need to know is if any RS had failed at a time when no master was around
to process its termination (by reassigning its regions to others).  For this, we propose a
second list of online RS that is maintained by the master (in addition to the ephemeral node
put up by each RS).  When a master actually processes a shutdown or onlining it would update
its list.  A new master would diff these two lists of znodes to determine if anything changed
during failover.

With the zk-based assignment we would no longer need to do this metascan/rpc to each server.
 To me that seems rather desirable and the zk approach is faster.  That recovery style described
in BT also requires holding all region and assignment information in memory?  There has been
some discussion around whether we want to go that way or not but for large clusters it could
get significant.  Even if we do go that way I still think using the regions in transition
out of zk is a better way to ensure cluster sanity when a failed over master starts up.

> Handle master failover for regions in transition
> ------------------------------------------------
>
>                 Key: HBASE-2700
>                 URL: https://issues.apache.org/jira/browse/HBASE-2700
>             Project: HBase
>          Issue Type: Sub-task
>          Components: master, zookeeper
>            Reporter: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.21.0
>
>
> To this point in HBASE-2692 tasks we have moved everything for regions in transition
into ZK, but we have not fully handled the master failover case.  This is to deal with that
and to write tests for it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message