hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
Date Wed, 09 Oct 2013 23:40:44 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791007#comment-13791007

Sergey Shelukhin commented on HBASE-5487:

Big response to not-responded-to recent comments.
Let me update the doc, EOW-ish probably depending on the number of bugs surfacing ;)

Let's keep discussion and doc here and branch tasks out for rewrites.
bq. + The problem section is too short (state kept in multiple places and all have to agree...);
need more full list so can be sure proposal addresses them all
What level of detail do you have in mind? It's not a bug fix, so I cannot really say "merge
races with snapshot", or something like that; that could also be arguably resolved by another
100k patch to existing AM :)
bq. + How is the proposal different from what we currently have? I see us tying regionstate
to table state. That is new. But the rest, where we have a record and it is atomically changed
looks like our RegionState in Master memory? There is an increasing 'version' which should
help ensure a 'direction' for change which should help.
See the design principles (and below discussion :)). We are trying to avoid multiple flavors
of split-brain state.
bq. Its fine having a source of truth but ain't the hard part bring the system along? (meta
edits, clients, etc.).
Yes :)
bq. Experience has zk as messy to reason with. It is also an indirection having RS and M go
to zk to do 'state'.
I think ZK got a bad reputation not on its own merit, but on how we use it.
I can see that problems exist but IMHO advantages outweigh the disadvantages compared to system
Co-located system table, I am not so sure, but so far there's no even high-level design for
this (for example - do all splits have to go thru master/system table now? how does it recover?
Perhaps we should abstract an async persistence mechanism sufficiently and then decide. Whether
it would be ZK+notifications, or system table, or memory + wal, or colocated system table,
or what.
The problem is that the usage inside master of that interface would depend on perf characteristics.
Anyway, we can work out the state transitions/concurrency/recovery without tying 100% to particular

bq. + Agree that master should become a lib that any regionserver can run.
That sounds possible.

bq. At least, we should make this really testable, without needing to set up a zk, a set of
rs and so on.
+1, see my comment above. 
bq. I really really really ( ) think that we need to put performances as a requirement for
any implementation. For example, something like: on a cluster with 5 racks of 20 regionserver
each, with 200 regions per RS,, the assignment will be completed in 1s if we lose one rack.
I saw a reference to async ZK in the doc, it's great, because the performances are 10 times
We can measure and improve, but I am not really sure about what exact numbers will be, at
this stage (we don't even know what storage is).

bq. A regionserver could first update the meta table, and then just notify the master that
a certain transition was done; the master could initiate the next transition (Elliott Clark
comment about coprocessor can probably be made to apply in this context). Only when a state
change is recorded in meta, the operation is considered successful.
Split, for example, requires several changes to meta. Will master be able to see them together
from the hook? If master is collocated in the same RS with meta, it should be small overhead
to have master RPC.

bq. Also, there is a chore (probably enhance catalog-janitor) in the master that periodically
goes over the meta table and restarts (along with some diagnostics; probing regionservers
in question etc.) failed/stuck state transitions. 
+1 on that. Transition states can indicate the start ts, and master will know when they started.

bq. I think we should also save the operations that was initiated by the client on the master
(either in WAL or in some system table) so that the master doesn't lose track of those and
can execute them in the face of crashes & restarts. For example, if the user had sent
a 'split region' operation and the master crashed
Yeah, "disable table" or "move region" are a good example. Probably we'd need ZK/system table/WAL
for ongoing logical operations.

bq. We should not have another janitor/chore. If an action is failed, it must be because of
something unrecoverable by itself, not because of a bug in our code. It should stay failed
until the issue is resolved.
I think the failures meant are things like RS went away, is slow or buggy, so OPENING got
stuck - someone needs to pick it up over timeout.

bq. We need to have something like FATE in accumulo to queue/retry actions taking several
steps like split/merge/move.
We basically need something that allows atomic state changes. HBase or ZK or mem+wal fit the
bill :)

> Generic framework for Master-coordinated tasks
> ----------------------------------------------
>                 Key: HBASE-5487
>                 URL: https://issues.apache.org/jira/browse/HBASE-5487
>             Project: HBase
>          Issue Type: New Feature
>          Components: master, regionserver, Zookeeper
>    Affects Versions: 0.94.0
>            Reporter: Mubarak Seyed
>            Priority: Critical
>         Attachments: Region management in Master.pdf
> Need a framework to execute master-coordinated tasks in a fault-tolerant manner. 
> Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s)
based on start/end key) can make use of this framework.
> The advantages of framework are
> 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated
> 2. Ability to abstract the common functions across Master -> ZK and RS -> ZK
> 3. Easy to plugin new master-coordinated tasks without adding code to core components

This message was sent by Atlassian JIRA

View raw message