hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
Date Fri, 29 Mar 2013 18:41:17 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617600#comment-13617600

Sergey Shelukhin commented on HBASE-5487:

bq. Any major-overhaul solution should make sure that these operations, when issued concurrently,
interact according to a sane set of semantics in the face of failures.
This is another (although not orthogonal) question.
I am looking for a sane way to define and enforce arbitrary semantics first. Then sane semantics
can be enforced on top of that :)
For example, in "actor-ish" model below would make it easy to write simple code; persistent
state would make sure there's definite state at any time, and all crucial transitions are
atomic, so semantics would be easy to enforce as long as the code can handle a failed transition/recovery.
Locks also make this simple, although locks have other problems imho.
Although we can go both ways, if we define sane semantics it would be easy to see how convenient
they are to implement in a particular model.

bq. So I buy open/close as a region operation. split/merge are multi region operations –
is there enough state to recover from a failure?
There should be. Can you elaborate?

bq. So alter table is a region operation? Why isn't it in the state machine?
Alter table is currently the operation that involves region operation, namely open/close.
Open-close are in the state machine :) As for tables, I am not sure state machine is the best
model for table state, there isn't that much going on with the table that is properly an exclusive

bq. Implementing region locks is too far – I'm asking for some back of the napkin discussionb.
If a server holds a lock for a region for time Tlock during each day, and number of regions
is N probability of some region lock (or table read-only lock) being held at any given time
is (1-(1-(Tlock/Tday))^N), if I am writing this correctly. For 5 seconds of locking per day
per region, for 10000 regions (not unreasonable for a large table/cluster) we will be holding
some lock about 44% of the time for region operations.
Calculating the probability of any lock being in recovery (server went down with a lock less
than recovery time ago) can also be done, but numbers for some parameters (how often do servers
go down?) will be very speculative...

bq. I think we need some measurements how much throughput we can get in ZK or with a ZK-lock
implementation and compare his with # rs of watchers * # of regions * number of ops...
Will there be many watchers/ops? You only watch and do ops when you acquire the lock, so unless
region operations are very frequent... 

bq. The current regions-in-transition (RIT) code basically assumes that an absent znode is
either closed or opened. RIT znodes are present when the region is in the inbetween states
(opening, closing,
I don't think "either closed or opened" is good enough :) Also, RITs don't cover all scenarios
and things like table ops don't use them at all.

bq. I know I've suggested something like this before. Currently the RS initiates a split,
and does the region open/meta changes. If there are errors, at some point the master side
detects a timeout. An alternative would have splits initiated RS on the rs but have the master
do some kind of atomic changes to meta and region state for the 3 involved regions (parent,
daughter a and daughter b).
Yeah, although in other models (locks, persistent state) that is not required. Also if meta
is cache for clients and not source of truth meta changes can still be on the server; I assume
by meta you mean global state, wherever that is?

bq. We need to be careful about ZK – since it is a network connection also, exceptions could
be failures or timeouts (which succeed but wan't able to ack). If we can describe the properties
(durable vs erasable) and assumptions (if the wipeable ZK is source of truth, how do we make
sure the version state is recoverable without time travel?)
The former applies to any distributed state; as for the latter - I was thinking of ZK+"WAL"
if we intend to keep ZK wipeable.

> Generic framework for Master-coordinated tasks
> ----------------------------------------------
>                 Key: HBASE-5487
>                 URL: https://issues.apache.org/jira/browse/HBASE-5487
>             Project: HBase
>          Issue Type: New Feature
>          Components: master, regionserver, Zookeeper
>    Affects Versions: 0.94.0
>            Reporter: Mubarak Seyed
>         Attachments: Region management in Master.pdf
> Need a framework to execute master-coordinated tasks in a fault-tolerant manner. 
> Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s)
based on start/end key) can make use of this framework.
> The advantages of framework are
> 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated
> 2. Ability to abstract the common functions across Master -> ZK and RS -> ZK
> 3. Easy to plugin new master-coordinated tasks without adding code to core components

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message