hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2700) Handle master failover for regions in transition
Date Thu, 24 Jun 2010 21:58:50 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882355#action_12882355

stack commented on HBASE-2700:

.bq In what situation does the data in ZK not have the actual state

ZK is mediating the open/close of regions.  For the final word on what is actually happening
on a RS, I'd say asking the RS will be more reliable than asking ZK.

.bq It seems to me that it is META which may not be up to date and META which can change without
the proper notifications being sent. 

If meta doesn't have the list of regions and their locations, clients are not going to work.
 Its a bug (I think we're clear on what zk is responsible for and where meta takes over).

Regards your comment that Master then has to update meta rather than RS as you prefer, where
do you get that from?  From the BT quote above: "Whenever this scan encounters a tablet that
is not already assigned, the master adds the tablet to the set of unassigned tablets, which
makes the tablet eligible for tablet assignment."  Sounds like master just adds the region
to the unassigned list.... which in our case is a queue up in zk.

.bq ZK allows us to ensure we never miss states and transitions.

I thought you said you can miss transitions in zk (Maybe you mean region transitions?).

.bq For second list of RS up in ZK, we could get this data in META but what about case where
a RS died while something was getting assigned to it but it did not finish opening and died?

Then it'll have nothing on it and no mention in .META.... it never existed and it never took
on data.  Wheres the prob?  The babysit process will restart it and hopefully second time
around it'll have more luck.

.bq ....persistent message passing via ZK is a better direction than the meta scanning / per-rs
check-in / heartbeating. 

You overreach with the above.  I'm talking about process master follows assuming master role.
 I'm not talking about heartbeating+messagepayload.  My suggestion that we consider what the
BT paper does is about getting what the RS is carrying from the horse's mouth rather than
from the mediator.... also, seems like we can simplify some by doing away w/ a second RS list
up in ZK?

.bq What if a single RS happens to be in a 10 second GC pause? 

Then RS will be slow to respond to master (or to zk).  Master can work on other RS reportings

.bq What about race conditions between what is in META and what the RSs know about?

What sort of race condiition are you thinking?  Master asks the RS for what it has.  RS could
even volunteer opening/closing states if that'd help (or master can just see zk for transitions
-- might be good to have both to help understand state).  It reads .META. for list of regions
and servers.  What do think could go amiss?

.bq What if we see in META something is unassigned but the previous master asked an RS to
open it?

See above.  RS could report openings or master can check zk or both.

On your second note, you have confidence that the representation of cluster state that is
up in zk is always true.  I don't have the same confidence.

> Handle master failover for regions in transition
> ------------------------------------------------
>                 Key: HBASE-2700
>                 URL: https://issues.apache.org/jira/browse/HBASE-2700
>             Project: HBase
>          Issue Type: Sub-task
>          Components: master, zookeeper
>            Reporter: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.21.0
> To this point in HBASE-2692 tasks we have moved everything for regions in transition
into ZK, but we have not fully handled the master failover case.  This is to deal with that
and to write tests for it.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message