hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jimmy Xiang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
Date Sat, 12 Oct 2013 18:34:48 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793436#comment-13793436
] 

Jimmy Xiang commented on HBASE-5487:
------------------------------------

[~fenghh], to the uncertainty due to ZK, I don't think it is because the way how we use it.
 It is more because ZK doesn't support continuous events.  You have to set the watch again
after each event callback.  The problem is that after an event is triggered, when we try to
get the data, the data could be changed again so an event is missed that will cause state
jump.

Currently, we do have a region state machine.  However, the machine is not strict due to the
ZK thing.  We could jump over some state, which make the state transition machine can't be
strictly enforced.  If we go without ZK, we can have a strict state machine to follow. That
will make things much predictable.

[~sershe], to the janitor, I think we don't need it.  Currently, we have a timeout monitor.
 But it is disabled and will be removed soon I think.  Without the monitor, ITBLL with CM
runs very well. With 0.96 tip, I tried to run ITBLL with CM with aggressive region moving,
and it is perfectly fine. If a RS is gone, SSH should handle it properly and assign regions.
 If there is a janitor, it will compete with SSH in this case, which probably does more harm
than good.

To make some RS to serve the role of master, besides we can have meta on it, we can have some
(not all, of course, to make [~jesse_yates] happy :) ) system tables on it too. This way,
we can support level region assignments, i.e. we can open some regions before the rest, if
these regions can be assigned to the master RS, or we can open on this master RS at first,
then move away later after system is fully started. This applies to some special regions only
for sure.

Now, we bundle two import modules (master + meta) in one RS. It is critical to make sure it
has light load, not die too often (even better, not die at all). So I think we should move
other regions out of the RS once it's promoted to be the master one.

I think we should allow only a list of RS with good hardware to be master, if not all RS nodes
have decent/same hardware.


> Generic framework for Master-coordinated tasks
> ----------------------------------------------
>
>                 Key: HBASE-5487
>                 URL: https://issues.apache.org/jira/browse/HBASE-5487
>             Project: HBase
>          Issue Type: New Feature
>          Components: master, regionserver, Zookeeper
>    Affects Versions: 0.94.0
>            Reporter: Mubarak Seyed
>            Priority: Critical
>         Attachments: Region management in Master.pdf
>
>
> Need a framework to execute master-coordinated tasks in a fault-tolerant manner. 
> Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s)
based on start/end key) can make use of this framework.
> The advantages of framework are
> 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated
tasks
> 2. Ability to abstract the common functions across Master -> ZK and RS -> ZK
> 3. Easy to plugin new master-coordinated tasks without adding code to core components



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message