zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deepak Vohra <dvohr...@yahoo.com.INVALID>
Subject Re: Zookeeper will not join the cluster after removing database post corruption
Date Fri, 15 Jan 2016 19:03:24 GMT
Shutdown the Zookeeper ensemble cluster.
Remove a node
Add another node
Restart Zookeeper ensemble
--------------------------------------------
On Fri, 1/15/16, Merlin Morgenstern <merlin.morgenstern@gmail.com> wrote:

 Subject: Zookeeper will not join the cluster after removing database post corruption
 To: user@zookeeper.apache.org
 Date: Friday, January 15, 2016, 8:27 AM
 
 I am running a 3 node zookeeper
 ensemble on ubuntu 14.04. After noticing
 that node1 was not running, I had to remove the snapshots of
 this node and
 now the server did start, but refuses to join the cluster.
 
 This is the log from zookeeper:
 
 2016-01-15 17:07:28,187 - INFO  [main:QuorumPeerConfig@101]
 - Reading
 configuration from: /etc/zookeeper/conf/zoo.cfg
 2016-01-15 17:07:28,189 - INFO  [main:QuorumPeerConfig@334]
 -
 Defaulting to majority quorums
 2016-01-15 17:07:28,191 - INFO  [main:DatadirCleanupManager@78]
 -
 autopurge.snapRetainCount set to 3
 2016-01-15 17:07:28,191 - INFO  [main:DatadirCleanupManager@79]
 -
 autopurge.purgeInterval set to 0
 2016-01-15 17:07:28,192 - INFO  [main:DatadirCleanupManager@101]
 -
 Purge task is not scheduled.
 2016-01-15 17:07:28,198 - INFO  [main:QuorumPeerMain@127]
 - Starting quorum peer
 2016-01-15 17:07:28,204 - INFO  [main:NIOServerCnxnFactory@94]
 -
 binding to port 0.0.0.0/0.0.0.0:2181
 2016-01-15 17:07:28,213 - INFO  [main:QuorumPeer@913] -
 tickTime set to 2000
 2016-01-15 17:07:28,213 - INFO  [main:QuorumPeer@933] -
 minSessionTimeout set to -1
 2016-01-15 17:07:28,213 - INFO  [main:QuorumPeer@944] -
 maxSessionTimeout set to -1
 2016-01-15 17:07:28,213 - INFO  [main:QuorumPeer@959] -
 initLimit set to 10
 2016-01-15 17:07:28,220 - INFO  [main:QuorumPeer@429] -
 currentEpoch
 not found! Creating with a reasonable default of 0. This
 should only
 happen when you are upgrading your installation
 2016-01-15 17:07:28,223 - INFO  [main:QuorumPeer@444] -
 acceptedEpoch
 not found! Creating with a reasonable default of 0. This
 should only
 happen when you are upgrading your installation
 2016-01-15 17:07:28,228 - INFO
 [Thread-1:QuorumCnxManager$Listener@486] - My
 election bind port:
 0.0.0.0/0.0.0.0:3888
 2016-01-15 17:07:28,233 - INFO
 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer@670] -
 LOOKING
 2016-01-15 17:07:28,234 - INFO
 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@740]
 - New
 election. My id =  1, proposed zxid=0x0
 2016-01-15 17:07:28,235 - INFO
 [WorkerReceiver[myid=1]:FastLeaderElection@542]
 - Notification: 1
 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state),
 1 (n.sid),
 0x0 (n.peerEPoch), LOOKING (my state)
 2016-01-15 17:07:28,237 - INFO
 [WorkerSender[myid=1]:QuorumCnxManager@190]
 - Have smaller server
 identifier, so dropping the connection: (2, 1)
 2016-01-15 17:07:28,237 - INFO
 [WorkerSender[myid=1]:QuorumCnxManager@190]
 - Have smaller server
 identifier, so dropping the connection: (3, 1)
 2016-01-15 17:07:28,238 - INFO
 [zk1/10.0.0.101:3888:QuorumCnxManager$Listener@493] -
 Received
 connection request /10.0.0.103:48431
 2016-01-15 17:07:28,239 - INFO
 [zk1/10.0.0.101:3888:QuorumCnxManager$Listener@493] -
 Received
 connection request /10.0.0.102:46486
 2016-01-15 17:07:28,239 - INFO
 [WorkerReceiver[myid=1]:FastLeaderElection@542]
 - Notification: 3
 (n.leader), 0x1500000072 (n.zxid), 0x20 (n.round), LEADING
 (n.state),
 3 (n.sid), 0x15 (n.peerEPoch), LOOKING (my state)
 2016-01-15 17:07:28,240 - INFO
 [WorkerReceiver[myid=1]:FastLeaderElection@542]
 - Notification: 3
 (n.leader), 0x1500000072 (n.zxid), 0x20 (n.round),
 FOLLOWING
 (n.state), 2 (n.sid), 0x15 (n.peerEPoch), LOOKING (my
 state)
 2016-01-15 17:07:28,240 - INFO
 [WorkerReceiver[myid=1]:FastLeaderElection@542]
 - Notification: 3
 (n.leader), 0x1500000072 (n.zxid), 0x20 (n.round), LEADING
 (n.state),
 3 (n.sid), 0x15 (n.peerEPoch), LOOKING (my state)
 2016-01-15 17:07:28,240 - INFO
 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer@738] -
 FOLLOWING
 2016-01-15 17:07:28,241 - INFO
 [WorkerReceiver[myid=1]:FastLeaderElection@542]
 - Notification: 3
 (n.leader), 0x1500000072 (n.zxid), 0x20 (n.round),
 FOLLOWING
 (n.state), 2 (n.sid), 0x15 (n.peerEPoch), FOLLOWING (my
 state)
 2016-01-15 17:07:28,242 - INFO
 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Learner@85] - TCP NoDelay
 set
 to: true
 2016-01-15 17:07:28,246 - INFO
 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] -
 Server
 environment:zookeeper.version=3.4.5--1, built on 06/10/2013
 17:26 GMT
 2016-01-15 17:07:28,246 - INFO
 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] -
 Server
 environment:host.name=fx1.findix.com
 2016-01-15 17:07:28,246 - INFO
 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] -
 Server
 environment:java.version=1.8.0_66
 2016-01-15 17:07:28,246 - INFO
 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] -
 Server
 environment:java.vendor=Oracle Corporation
 2016-01-15 17:07:28,246 - INFO
 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] -
 Server
 environment:java.home=/usr/lib/jvm/java-8-oracle/jre
 2016-01-15 17:07:28,246 - INFO
 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] -
 Server
 environment:java.class.path=/etc/zookeeper/conf:/usr/share/java/jline.jar:/usr/share/java/log4j-1.2.jar:/usr/share/java/xercesImpl.jar:/usr/share/java/xmlParserAPIs.jar:/usr/share/java/netty.jar:/usr/share/java/slf4j-api.jar:/usr/share/java/slf4j-log4j12.jar:/usr/share/java/zookeeper.jar
 2016-01-15 17:07:28,246 - INFO
 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] -
 Server
 environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
 2016-01-15 17:07:28,246 - INFO
 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] -
 Server
 environment:java.io.tmpdir=/tmp
 2016-01-15 17:07:28,246 - INFO
 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] -
 Server
 environment:java.compiler=<NA>
 2016-01-15 17:07:28,246 - INFO
 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] -
 Server
 environment:os.name=Linux
 2016-01-15 17:07:28,246 - INFO
 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] -
 Server
 environment:os.arch=amd64
 2016-01-15 17:07:28,246 - INFO
 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] -
 Server
 environment:os.version=3.19.0-43-generic
 2016-01-15 17:07:28,246 - INFO
 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] -
 Server
 environment:user.name=zookeeper
 2016-01-15 17:07:28,246 - INFO
 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] -
 Server
 environment:user.home=/var/lib/zookeeper
 2016-01-15 17:07:28,246 - INFO
 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] -
 Server
 environment:user.dir=/
 2016-01-15 17:07:28,247 - INFO
 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@162]
 -
 Created server with tickTime 2000 minSessionTimeout 4000
 maxSessionTimeout 40000 datadir /var/lib/zookeeper/version-2
 snapdir
 /var/lib/zookeeper/version-2
 2016-01-15 17:07:28,247 - INFO
 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@63] - FOLLOWING
 -
 LEADER ELECTION TOOK - 13
 2016-01-15 17:07:28,253 - INFO
 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Learner@325] - Getting a
 snapshot from leader
 2016-01-15 17:07:28,262 - INFO
 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FileTxnSnapLog@240]
 -
 Snapshotting: 0x16000009b6 to
 /var/lib/zookeeper/version-2/snapshot.16000009b6
 2016-01-15 17:07:48,724 - INFO
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197]
 -
 Accepted socket connection from /10.0.0.101:58156
 2016-01-15 17:07:48,761 - INFO
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@821]
 -
 Processing stat command from /10.0.0.101:58156
 2016-01-15 17:07:48,763 - INFO
 [Thread-2:NIOServerCnxn$StatCommand@655] -
 Stat command output
 2016-01-15 17:07:48,763 - INFO  [Thread-2:NIOServerCnxn@1001]
 - Closed
 socket connection for client /10.0.0.101:58156 (no session
 established
 for client)
 2016-01-15 17:08:02,003 - WARN
 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@118] - Got
 zxid
 0x16000009b7 expected 0x1
 2016-01-15 17:08:02,003 - INFO  [SyncThread:1:FileTxnLog@199] -
 Creating new log file: log.16000009b7
 
 The relevant entry seems to be: Got zxid 0x16000009b7
 expected 0x1
 
 The configuration of the node was OK I believe, so it must
 be something
 else that is wrong.

Mime
View raw message