hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: Backup HMasters will go down if the zk connection expires without recovery
Date Thu, 20 Mar 2014 18:35:37 GMT
Why did the backup master's zookeeper session expire? That indicates a
problem somewhere on the network or with zookeeper.

The active master and regionservers also shut down when their sessions
expire. If our zookeeper session expires we have been partitioned and have
a high degree of uncertainty from our vantage point on the state of the
world. We shut down to avoid accidentally taking incorrect actions with bad
or out of date state. This simplifies design and removes corner cases.  In
a production environment I would expect a site local strategy (could be
daemontools etc.) for automatic service recovery, if that is desired.

On Thu, Mar 20, 2014 at 12:43 AM, Du, Jingcheng <jingcheng.du@intel.com>wrote:

> Dear Devs,
>   Now I encounter a problem in the HMaster.
>   Currently I run multiple HMasters in a cluster. If the ZK connection of
> one of the backup HMasters expires, this backup HMaster will go down
> directly without recovering the ZK connection.
> I saw there were such code in the HMaster.abortNow() listed below, the
> fail.fast only works for active HMaster. Do the backup ones need to be
> recovered if the zk connection expires? Please advise. Thanks.
> if (!this.isActiveMaster || this.stopped) {
>       return true;
>     }
> boolean failFast = conf.getBoolean("fail.fast.expired.active.master",
> false);
> Regards,
> Jingcheng

Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message