hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
Date Wed, 27 Mar 2013 23:57:15 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13615901#comment-13615901
] 

Sergey Shelukhin commented on HBASE-5487:
-----------------------------------------

bq. Is the assignment manager the only "master coordinated" task in scope?
Only for the current document version... tables could be added.
bq. Instead of asserting it is not clear if table (+region) locks scale, let's find out.
Hmm... that would require implementing region locks, and having a very large cluster. I am
talking more about unacceptable blocking of user operations, and management of expiring locks
in presense of real-life failures.
bq. Master operations and processes can clash and we should understand where we need concurrency
control. (I'm working on a table – here's an draft distilled version [1], there exists an
overly detailed version that I'll share once i get it fixed)
Comments below.
bq. Should there be a notion of queuing operations? (locking, or an actual queue) Should these
operations be generically logged so they can complete if a master goes down in the middle?
(ex: master goes down during a "move" operation after the close but before the open on the
new rs).
You mean like WAL for operations?

bq. The "design principles" is actually more of a proposed design.
Yeah, sorry, wanted to split it into two sections but never did. Will rename.

bq. how do we deal with operations where we need "locks" on multiple region because we are
reading or modifying multiple regions – e.g. splits, merges, snapshots? Matteo Bertozzi
had suggested in another jira making a the meta row per table, or maybe part of the solution
is using the multi-row single meta region transaction.
Depends on where we store it, but yeah these have to be transactional. Last section (very
short :)) suggests using ZK, which already supports that.


bq. What are alternatives? why this approach vs others?
I can expand the doc... the implicitly mentioned existing alternatives are locks, which I
would argue scale less and are harder to manage; or transaction approach that is currently
used (although not unified), for example via transient transaction nodes.

Actually, one alternative approach I saw used for such things is to simplify concurrency of
operations/etc. with actor-like model, where master has logical cluster state and previously
saved target state, and periodically (often) takes an epic lock, looks at them quickly, and
based on what it is doing, outputs new target cluster state and a list of physical things
to do Then it releases epic lock, and the new target state is saved, and operations performed.
That way all state-management code becomes simple, because it runs in one place with no concurrency,
and recovery just has to compare real cluster state with destination state. 
But this will require thinking about this differently. 
Also usually that would mean RSes won't be able to initiate operations (like split) - they
will have to go thru master (which I would argue is ok).
Also it's not clear whether this will become too much of a bottleneck.


bq. Where do you think the new information will be, META table?
It seems to me that ZK would be better (see last section), but META is also an option.

>From the spreadsheet:
bq. Enabling and disabling table operations should be blocked when  any of these simple region
operations are in progress
Not clear why (logically).
bq. move
Move is close and open, doesn't require consistency, right?
bq. Regionserver Processes ... However, the individual operations must  maintain the table
integrity property.
Not clear what this means for snapshots.
  
                
> Generic framework for Master-coordinated tasks
> ----------------------------------------------
>
>                 Key: HBASE-5487
>                 URL: https://issues.apache.org/jira/browse/HBASE-5487
>             Project: HBase
>          Issue Type: New Feature
>          Components: master, regionserver, Zookeeper
>    Affects Versions: 0.94.0
>            Reporter: Mubarak Seyed
>         Attachments: Region management in Master.pdf
>
>
> Need a framework to execute master-coordinated tasks in a fault-tolerant manner. 
> Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s)
based on start/end key) can make use of this framework.
> The advantages of framework are
> 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated
tasks
> 2. Ability to abstract the common functions across Master -> ZK and RS -> ZK
> 3. Easy to plugin new master-coordinated tasks without adding code to core components

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message