zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Heffner (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ZOOKEEPER-2791) Quorum doesn't recover after zxid rollover
Date Wed, 24 May 2017 15:04:04 GMT
Mike Heffner created ZOOKEEPER-2791:
---------------------------------------

             Summary: Quorum doesn't recover after zxid rollover
                 Key: ZOOKEEPER-2791
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2791
             Project: ZooKeeper
          Issue Type: Bug
          Components: leaderElection, quorum
    Affects Versions: 3.4.8, 3.3.6
         Environment: Ubuntu 14.04.4 LTS, AWS EC2, 5 node ensembles
            Reporter: Mike Heffner


When zxid rolls over the ensemble is unable to recover without manually restarting the cluster.
The leader enters shutdown() state when zxid rolls over, but the remaining four nodes in the
ensemble are not able to re-elect a new leader. This state has persisted for at least 15 minutes
before an operator manually restarted the cluster and the ensemble recovered.

Config:
--------
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/raid0/zookeeper
clientPort=2181
maxClientCnxns=100
autopurge.snapRetainCount=14
autopurge.purgeInterval=24
leaderServes: True
server.7=172.26.134.88:2888:3888
server.6=172.26.136.143:2888:3888
server.5=172.26.135.103:2888:3888
server.4=172.26.134.16:2888:3888
server.9=172.26.135.19:2888:3888

Logs:

https://gist.github.com/mheffner/d615d358d4a360ae56a0d0a280040640




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message