hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
Date Fri, 18 Oct 2013 22:48:44 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799619#comment-13799619

Sergey Shelukhin commented on HBASE-5487:

Answers lifted from email also (some fixes + one answer was modified due to clarification
here :)).

bq.  What is a failure and how do you react to failures? I think the master5 design needs
to spend more effort  to considering failure and recovery cases. I claim there are 4 types
of responses from a networked IO operation -  two states we normally deal with ack successful,
ack failed (nack) and unknown due to timeout  that succeeded (timeout success) and unknown
due to timeout that failed (timeout failed). We have historically missed the last two cases
and they aren't considered in the master5 design. 

There are a few considerations. Let me examine if there are other cases than these.
I am assuming the collocated table, which should reduce such cases for state (probably, if
collocated table cannot be written reliably, master must stop-the-world and fail over).
When RS contacts master to do state update, it errs on the side of caution - no state update,
no open region (or split).
Thus, except for the case of multiple masters running, we can always assume RS didn't online
the region if we don't know about it.
Then, for messages to RS, see "Note on messages"; they are idempotent so they can always be

bq.  1) State update coordination.  What is a "state updates from the outside"  Do RS's initiate
splitting on their own?  Maybe a picture would help so we can figure out if it is similar
or different from hbck-master's?

Yes, these are RS messages. They are mentioned in some operation descriptions in part 2 -
opening->opened, closing->closed; splitting, etc.

bq.  2) Single point of truth.  hbck-master tries to define what single point of truth means
by defining intended, current, and actual state data with durability properties on each kind.
What do clients look at who modifies what? 

Sorry, don't understand the question. I mean single source of truth mainly about what is going
on with the region; it is described in design considerations.
I like the idea of "intended state", however without more detailed reading I am not sure how
it works for multiple ops e.g. master recovering the region while the user intends to split
it, so the split should be executed after it's opened.

bq.  3) Table record: "if regions is out of date, it should be closed and reopened". It is
not clear in master5 how regionservers find out that they are out of date. Moreover, how do
clients talking to those RS's with stale versions know they are going to the correct RS especially
in the face of RS failures due to timeout?

On alter (and startup if failed), master tries to reopen all regions that are out of date.
Regions that are not opened with either pick up the new version when they are opened, or (e.g.
if they are now Opening with old version) master discovers they are out of date when they
are transitioned to Opened by RS, and reopens them again.

As for any case of alter on enabled table, there are no guarantees for clients.
To provide these w/o disable/enable (or logical equivalent of coordinating all close-s and
open-s), one would need some form of version-time-travel, or waiting for versions, or both.

bq.  4) region record: transition states.  This is really similar to hbck-masters current
state and intended state.  Shouldn't be defined as part of the region record?

I mention somewhere that could be done. One thing is that if several paths are possible between
states, it's useful to know which is taken.
But do note that I store user intent separately from what is currently going on, so they are
not exactly similar as far as I see.

bq.  5) Note on user operations: the forgetting thing is scary to me -- in your move split
example, what happens if an RS reads state that is forgotten?  

I think my description of this might be too vague. State is not forgotten; previous intent
is forgotten. I.e. if user does several operations in order that conflict (e.g. split and
then merge), the first one will be canceled (safely :)).
Also, RS does not read state as a guideline to what needs to be done.

bq.  6) table state machine. how do we guarantee clients are writing from the correct version
in the in failures?

The intent is to fence the WAL for region server, the way we do now. One could also use other
Perhaps I could specify it more clearly; I think the problem of making sure RS is dead is
nearly orthogonal.
In my model, due to how opening region is committed to opened, we can only be unsure when
the region is in Opened state (or similar states such as Splitting which are not present in
my current version, but will be added).
In that case, in absence of normal transition, we cannot do literally anything with the region
unless we are sufficiently sure that RS is sufficiently dead (e.g. cannot write).
So, while we ensure that RS is dead we don't reassign.
My document implies (but doesn't elaborate, I'll fix that) that master does direct Opened->Closed
direct transition only when that is true.
A state called "MaybeOpened" could be added. Let me add it...

bq.  7) region state machine.  Earlier draft hand splitting and merge cases.  Are they elided
in master5 or are not present any more. How would this get extended handle jeffrey's distributed
log replay/fast write recovery feature? 

As I mention somewhere these could be separate states. I was kind of afraid of blowing up
state machine too much, so I noticed that for split/merge you anyway store siblings/children,
so you can recognize them and for most purposes different split-merge states are the same
as Opened and Closed.

I will add those back, it would make sense.

bq.  8) logical interactions:  sounds like master5 allows concurrent region and table operations.
 hbck-master (though not fully documented) only allows certain region transitions when the
table is enabled or if the table is disabled.  Are we sure we don't get into race conditions?
 What happens if disable gets issued -- its possible for someone to reopens the region and
for old clients to continue writing to it even though it is closed?

Yes, parallelism is intended. You can never be sure you have no races but we should aim for
it :)

master5 is missing disabled/enabled check, that is a mistake.

Part1 operation interactions already cover it:

    table disable doesn't ack until all regions are closed (master5 is wrong :().
    region opening cannot start if table is already disabling or disabled.
    if region is already opening when disable is issued, opening will be opportunistically
    if disable fails to cancel opening, or server opens it first in a race, region will be
opened, and master will issue close immediately after state update. Given that region is not
closed, disable is not complete.
    if opening (or closing) times out, master will fence off RS and mark region as closed.
If there was some way of fencing region separately (ZK lease?) it would be possible to use

In any case, until client checks table state before every write, there's no easy way to prevent
writes on disabling table. Writes on disabled table will not be possible.

> Generic framework for Master-coordinated tasks
> ----------------------------------------------
>                 Key: HBASE-5487
>                 URL: https://issues.apache.org/jira/browse/HBASE-5487
>             Project: HBase
>          Issue Type: New Feature
>          Components: master, regionserver, Zookeeper
>    Affects Versions: 0.94.0
>            Reporter: Mubarak Seyed
>            Assignee: Sergey Shelukhin
>            Priority: Critical
>         Attachments: Entity management in Master - part 1.pdf, hbckMasterV2-long.pdf,
Region management in Master5.docx, Region management in Master.pdf
> Need a framework to execute master-coordinated tasks in a fault-tolerant manner. 
> Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s)
based on start/end key) can make use of this framework.
> The advantages of framework are
> 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated
> 2. Ability to abstract the common functions across Master -> ZK and RS -> ZK
> 3. Easy to plugin new master-coordinated tasks without adding code to core components

This message was sent by Atlassian JIRA

View raw message