Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Wed, 8 Feb 2012 18:40:59 +0000 (UTC)
From: "Jesse Yates (Commented) (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: 
 <1878450710.15808.1328726459831.JavaMail.tomcat@hel.zones.apache.org>
In-Reply-To: 
 <952564254.15765.1328725981536.JavaMail.tomcat@hel.zones.apache.org>
Subject: [jira] [Commented] (HBASE-5353) HA/Distributed HMaster via
 RegionServers
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HBASE-5353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203829#comment-13203829 ] 

Jesse Yates commented on HBASE-5353:
------------------------------------

I was thinking about this and it seems like it wouldn't be that hard to have each of the regionservers doing leader election via ZK to select the one (or top 'n' rs) that would spin up master instances on their local machine. Those new masters could do their own leader election in ZK to determine who is the current 'official' HMaster, and the others would act as hot failovers. If a master dies, the next rs in the list would spin up a master instance, ensuring that we always have a certain number of hot masters (clearly cascading failure here is a problem, but if that happens, you have bigger problems). Clearly, running the master from the same JVM is probably a bad idea, but you could potentially even use the startup scripts to spin up a separate jvm with the master.

This also means some modification to the client, to keep track of the current master, but that should be fairly trivial, as it already has the zk connection (or can do a fail and lookup). 
                
> HA/Distributed HMaster via RegionServers
> ----------------------------------------
>
>                 Key: HBASE-5353
>                 URL: https://issues.apache.org/jira/browse/HBASE-5353
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, regionserver
>    Affects Versions: 0.94.0
>            Reporter: Jesse Yates
>            Priority: Minor
>
> Currently, the HMaster node must be considered a 'special' node (single point of failure), meaning that the node must be protected more than the other commodity machines. It should be possible to instead have the HMaster be much more available, either in a distributed sense (meaning a bit rewrite) or with multiple instances and automatic failover. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira