hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5353) HA/Distributed HMaster via RegionServers
Date Wed, 08 Feb 2012 22:22:59 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204058#comment-13204058

Todd Lipcon commented on HBASE-5353:

bq. The cluster knows about it, so you can have a link on the webui to the master or any of
the region servers

And each of the potential masters publishes metrics to ganglia, so if you want to find the
master metrics, you have to hunt around in the ganglia graphs for which master was active
at that time?
And any cron jobs or nagios alerts you write need to first call some HBase utility to find
the active master's IP via ZK in order to get to it?

bq. True, but if those masters fail over, then your cluster management needs to be aware enough
of that to provision more, on different servers

If you have two masters on separate racks, and you have any reasonable monitoring, then your
ops team will restart or provision a new one when they fail. I've never ever heard of this
kind of scenario being a major cause of downtime.

The whole thing seems like a bad idea to me. I won't -1 but consider me -0.5
> HA/Distributed HMaster via RegionServers
> ----------------------------------------
>                 Key: HBASE-5353
>                 URL: https://issues.apache.org/jira/browse/HBASE-5353
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, regionserver
>    Affects Versions: 0.94.0
>            Reporter: Jesse Yates
>            Priority: Minor
> Currently, the HMaster node must be considered a 'special' node (single point of failure),
meaning that the node must be protected more than the other commodity machines. It should
be possible to instead have the HMaster be much more available, either in a distributed sense
(meaning a bit rewrite) or with multiple instances and automatic failover. 

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message