zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hkwan <hk...@centerfield.com>
Subject Unable to connect node to ensemble after restart of node zookeeper 3.4.6
Date Wed, 10 Jan 2018 01:16:51 GMT
I have a 3 node ensemble in production and after restarting one node it can
no longer connect to the ensemble.  I am getting this error below:

2018-01-10 00:49:32,492 [myid:2] - INFO 
[WorkerSender[myid=2]:QuorumCnxManager@193] - Have smaller server
identifier, so dropping the connection: (3, 2)
2018-01-10 00:50:20,342 [myid:2] - WARN 
[RecvWorker:1:QuorumCnxManager$RecvWorker@780] - Connection broken for id 1,
my id = 2, error =
java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:197)
        at java.net.SocketInputStream.read(SocketInputStream.java:122)
        at java.net.SocketInputStream.read(SocketInputStream.java:211)
        at java.io.DataInputStream.readInt(DataInputStream.java:387)
        at
org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:765)
2018-01-10 00:50:20,343 [myid:2] - WARN 
[RecvWorker:1:QuorumCnxManager$RecvWorker@783] - Interrupting SendWorker
2018-01-10 00:50:20,343 [myid:2] - WARN 
[SendWorker:1:QuorumCnxManager$SendWorker@697] - Interrupted while waiting
for message on queue
java.lang.InterruptedException
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2095)
        at
java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:389)
        at
org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:849)
        at
org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:64)
        at
org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:685)
2018-01-10 00:50:20,343 [myid:2] - WARN 
[SendWorker:1:QuorumCnxManager$SendWorker@706] - Send worker leaving thread
2018-01-10 00:50:32,491 [myid:2] - INFO 
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
Notification time out: 60000
2018-01-10 00:50:32,493 [myid:2] - INFO 
[WorkerReceiver[myid=2]:FastLeaderElection@597] - Notification: 1 (message
format version), 2 (n.leader), 0x707e3e9a9 (n.zxid), 0x1 (n.round), LOOKING
(n.state), 2 (n.sid), 0x7 (n.peerEpoch) LOOKING (my state)
2018-01-10 00:50:32,495 [myid:2] - INFO 
[WorkerSender[myid=2]:QuorumCnxManager@193] - Have smaller server
identifier, so dropping the connection: (3, 2)
2018-01-10 00:51:32,494 [myid:2] - INFO 
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
Notification time out: 60000
2018-01-10 00:51:32,494 [myid:2] - INFO 
[WorkerReceiver[myid=2]:FastLeaderElection@597] - Notification: 1 (message
format version), 2 (n.leader), 0x707e3e9a9 (n.zxid), 0x1 (n.round), LOOKING
(n.state), 2 (n.sid), 0x7 (n.peerEpoch) LOOKING (my state)
2018-01-10 00:51:32,496 [myid:2] - INFO 
[WorkerSender[myid=2]:QuorumCnxManager@193] - Have smaller server
identifier, so dropping the connection: (3, 2)
2018-01-10 00:52:19,126 [myid:2] - WARN 
[RecvWorker:1:QuorumCnxManager$RecvWorker@780] - Connection broken for id 1,
my id = 2, error =
java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:197)
        at java.net.SocketInputStream.read(SocketInputStream.java:122)
        at java.net.SocketInputStream.read(SocketInputStream.java:211)
        at java.io.DataInputStream.readInt(DataInputStream.java:387)
        at
org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:765)
2018-01-10 00:52:19,127 [myid:2] - WARN 
[RecvWorker:1:QuorumCnxManager$RecvWorker@783] - Interrupting SendWorker
2018-01-10 00:52:19,127 [myid:2] - WARN 
[SendWorker:1:QuorumCnxManager$SendWorker@697] - Interrupted while waiting
for message on queue
java.lang.InterruptedException
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2095)
        at
java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:389)
        at
org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:849)
        at
org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:64)
        at
org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:685)
2018-01-10 00:52:19,128 [myid:2] - WARN 
[SendWorker:1:QuorumCnxManager$SendWorker@706] - Send worker leaving thread
2018-01-10 00:52:32,495 [myid:2] - INFO 
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
Notification time out: 60000
2018-01-10 00:52:32,497 [myid:2] - INFO 
[WorkerReceiver[myid=2]:FastLeaderElection@597] - Notification: 1 (message
format version), 2 (n.leader), 0x707e3e9a9 (n.zxid), 0x1 (n.round), LOOKING
(n.state), 2 (n.sid), 0x7 (n.peerEpoch) LOOKING (my state)
2018-01-10 00:52:32,499 [myid:2] - INFO 
[WorkerSender[myid=2]:QuorumCnxManager@193] - Have smaller server
identifier, so dropping the connection: (3, 2)


my configuration on all three servers are:

clientPort=2181
dataDir=/var/opt/zookeeper/data
tickTime=2000
autopurge.purgeInterval=24
initLimit=10
syncLimit=5
server.1=10.1.0.122:2888:3888
server.2=10.1.1.75:2888:3888
server.3=10.1.2.221:2888:3888

server 3 is currently leader
server 1 is currently follower
server 2 currently cannot rejoin the ensemble

myid files are correctly configured for all three servers.  this is a
production cluster so I would like to know if there was a way to force the
node back into the cluster without anything drastic that would cause the
quorum to be lost.





--
Sent from: http://zookeeper-user.578899.n2.nabble.com/

Mime
View raw message