hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Kyle Purtell (Jira)" <j...@apache.org>
Subject [jira] [Created] (HBASE-23206) ZK quorum redundancy with failover in RZK
Date Wed, 23 Oct 2019 17:36:00 GMT
Andrew Kyle Purtell created HBASE-23206:

             Summary: ZK quorum redundancy with failover in RZK
                 Key: HBASE-23206
                 URL: https://issues.apache.org/jira/browse/HBASE-23206
             Project: HBase
          Issue Type: Brainstorming
            Reporter: Andrew Kyle Purtell

We have faced a few production issues where the reliability of the ZooKeeper quorum serving
the cluster has not been as robust as expected. The most recent one was essentially ZOOKEEPER-2164
(and related: ZOOKEEPER-900). These can be mitigated by a ZK server configuration change but
the incidents suggest it may be worth thinking about how to be less reliant on the service
provided by a single ZK quorum instance. 

A solution would be holistic with several parts:
- HBASE-18095 to get ZK dependencies out of the client
- Related HBase replication improvements to track peer and position state in HBase tables
instead of znodes
- This brainstorming...

For this part, we could consider the possibility that RecoverableZooKeeper (RZK) might be
taught how to speak to two separate ZK quorum redundantly, and continue to offer service even
if one of them is temporarily unable to provide service. 

This message was sent by Atlassian Jira

View raw message