Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Message-ID: <419418.43361277403530304.JavaMail.jira@thor>
Date: Thu, 24 Jun 2010 14:18:50 -0400 (EDT)
From: "Jonathan Gray (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Subject: [jira] Commented: (HBASE-2700) Handle master failover for regions
 in transition
In-Reply-To: <27617847.33701276033580816.JavaMail.jira@thor>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HBASE-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882262#action_12882262 ] 

Jonathan Gray commented on HBASE-2700:
--------------------------------------

In what situation does the data in ZK not have the actual state?  In order for a RS to, for example, open a region, it must transition a node in ZK from nothing, to OPENING, to OPENED; if this fails it does not open.  It seems to me that it is META which may not be up to date and META which can change without the proper notifications being sent.

In style where we ask RS what they host and match that up against META, we then must do all edits of META on master side.  Otherwise there will always be race conditions between what master thinks is the state (via meta scan) and what the actual state is (via RS setting stuff in meta).  ZK allows us to ensure we never miss states and transitions.

For second list of RS up in ZK, we could get this data in META but what about case where a RS died while something was getting assigned to it but it did not finish opening and died?  Whether this is a problem or not depends very much on who is the one who edits meta, whether we rely on meta to determine something is not assigned, etc...

There has been consideration as to how this is handled in BT paper but I guess I just am of the mindset that the explicit, persistent message passing via ZK is a better direction than the meta scanning / per-rs check-in / heartbeating.  What happens if we have 1000 RS and 1M regions?  That's a significant amount of work to do.  What if a single RS happens to be in a 10 second GC pause?  What about race conditions between what is in META and what the RSs know about?  What if we see in META something is unassigned but the previous master asked an RS to open it?  That RS is in "opening" state but it is not yet assigned so would it come back with the list of assigned regions to that server?  This is super explicit via transitions in zk.

As for all in memory, I think we can punt on this for a while.  The only thing pertinent to this discussion is that if holding it all in memory is possibly untenable, doesn't that mean that it's untenable to do master failover in this style (hold every RS and its R after asking it via RPC, and holding the META view of every R and the RS it is assigned to)?

> Handle master failover for regions in transition
> ------------------------------------------------
>
>                 Key: HBASE-2700
>                 URL: https://issues.apache.org/jira/browse/HBASE-2700
>             Project: HBase
>          Issue Type: Sub-task
>          Components: master, zookeeper
>            Reporter: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.21.0
>
>
> To this point in HBASE-2692 tasks we have moved everything for regions in transition into ZK, but we have not fully handled the master failover case.  This is to deal with that and to write tests for it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.