Hi hkwan,
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:197)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at java.net.SocketInputStream.read(SocketInputStream.java:211)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
This looks like a network issue to me.
Have you tried connecting a client from server 2 to the leader?
Regards,
Andor
On Wed, Jan 10, 2018 at 2:16 AM, hkwan <hkwan@centerfield.com> wrote:
> I have a 3 node ensemble in production and after restarting one node it can
> no longer connect to the ensemble. I am getting this error below:
>
> 2018-01-10 00:49:32,492 [myid:2] - INFO
> [WorkerSender[myid=2]:QuorumCnxManager@193] - Have smaller server
> identifier, so dropping the connection: (3, 2)
> 2018-01-10 00:50:20,342 [myid:2] - WARN
> [RecvWorker:1:QuorumCnxManager$RecvWorker@780] - Connection broken for id
> 1,
> my id = 2, error =
> java.net.SocketException: Connection reset
> at java.net.SocketInputStream.read(SocketInputStream.java:197)
> at java.net.SocketInputStream.read(SocketInputStream.java:122)
> at java.net.SocketInputStream.read(SocketInputStream.java:211)
> at java.io.DataInputStream.readInt(DataInputStream.java:387)
> at
> org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(
> QuorumCnxManager.java:765)
> 2018-01-10 00:50:20,343 [myid:2] - WARN
> [RecvWorker:1:QuorumCnxManager$RecvWorker@783] - Interrupting SendWorker
> 2018-01-10 00:50:20,343 [myid:2] - WARN
> [SendWorker:1:QuorumCnxManager$SendWorker@697] - Interrupted while waiting
> for message on queue
> java.lang.InterruptedException
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.
> reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$
> ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2095)
> at
> java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:389)
> at
> org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(
> QuorumCnxManager.java:849)
> at
> org.apache.zookeeper.server.quorum.QuorumCnxManager.
> access$500(QuorumCnxManager.java:64)
> at
> org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(
> QuorumCnxManager.java:685)
> 2018-01-10 00:50:20,343 [myid:2] - WARN
> [SendWorker:1:QuorumCnxManager$SendWorker@706] - Send worker leaving
> thread
> 2018-01-10 00:50:32,491 [myid:2] - INFO
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
> Notification time out: 60000
> 2018-01-10 00:50:32,493 [myid:2] - INFO
> [WorkerReceiver[myid=2]:FastLeaderElection@597] - Notification: 1 (message
> format version), 2 (n.leader), 0x707e3e9a9 (n.zxid), 0x1 (n.round), LOOKING
> (n.state), 2 (n.sid), 0x7 (n.peerEpoch) LOOKING (my state)
> 2018-01-10 00:50:32,495 [myid:2] - INFO
> [WorkerSender[myid=2]:QuorumCnxManager@193] - Have smaller server
> identifier, so dropping the connection: (3, 2)
> 2018-01-10 00:51:32,494 [myid:2] - INFO
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
> Notification time out: 60000
> 2018-01-10 00:51:32,494 [myid:2] - INFO
> [WorkerReceiver[myid=2]:FastLeaderElection@597] - Notification: 1 (message
> format version), 2 (n.leader), 0x707e3e9a9 (n.zxid), 0x1 (n.round), LOOKING
> (n.state), 2 (n.sid), 0x7 (n.peerEpoch) LOOKING (my state)
> 2018-01-10 00:51:32,496 [myid:2] - INFO
> [WorkerSender[myid=2]:QuorumCnxManager@193] - Have smaller server
> identifier, so dropping the connection: (3, 2)
> 2018-01-10 00:52:19,126 [myid:2] - WARN
> [RecvWorker:1:QuorumCnxManager$RecvWorker@780] - Connection broken for id
> 1,
> my id = 2, error =
> java.net.SocketException: Connection reset
> at java.net.SocketInputStream.read(SocketInputStream.java:197)
> at java.net.SocketInputStream.read(SocketInputStream.java:122)
> at java.net.SocketInputStream.read(SocketInputStream.java:211)
> at java.io.DataInputStream.readInt(DataInputStream.java:387)
> at
> org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(
> QuorumCnxManager.java:765)
> 2018-01-10 00:52:19,127 [myid:2] - WARN
> [RecvWorker:1:QuorumCnxManager$RecvWorker@783] - Interrupting SendWorker
> 2018-01-10 00:52:19,127 [myid:2] - WARN
> [SendWorker:1:QuorumCnxManager$SendWorker@697] - Interrupted while waiting
> for message on queue
> java.lang.InterruptedException
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.
> reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$
> ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2095)
> at
> java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:389)
> at
> org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(
> QuorumCnxManager.java:849)
> at
> org.apache.zookeeper.server.quorum.QuorumCnxManager.
> access$500(QuorumCnxManager.java:64)
> at
> org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(
> QuorumCnxManager.java:685)
> 2018-01-10 00:52:19,128 [myid:2] - WARN
> [SendWorker:1:QuorumCnxManager$SendWorker@706] - Send worker leaving
> thread
> 2018-01-10 00:52:32,495 [myid:2] - INFO
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
> Notification time out: 60000
> 2018-01-10 00:52:32,497 [myid:2] - INFO
> [WorkerReceiver[myid=2]:FastLeaderElection@597] - Notification: 1 (message
> format version), 2 (n.leader), 0x707e3e9a9 (n.zxid), 0x1 (n.round), LOOKING
> (n.state), 2 (n.sid), 0x7 (n.peerEpoch) LOOKING (my state)
> 2018-01-10 00:52:32,499 [myid:2] - INFO
> [WorkerSender[myid=2]:QuorumCnxManager@193] - Have smaller server
> identifier, so dropping the connection: (3, 2)
>
>
> my configuration on all three servers are:
>
> clientPort=2181
> dataDir=/var/opt/zookeeper/data
> tickTime=2000
> autopurge.purgeInterval=24
> initLimit=10
> syncLimit=5
> server.1=10.1.0.122:2888:3888
> server.2=10.1.1.75:2888:3888
> server.3=10.1.2.221:2888:3888
>
> server 3 is currently leader
> server 1 is currently follower
> server 2 currently cannot rejoin the ensemble
>
> myid files are correctly configured for all three servers. this is a
> production cluster so I would like to know if there was a way to force the
> node back into the cluster without anything drastic that would cause the
> quorum to be lost.
>
>
>
>
>
> --
> Sent from: http://zookeeper-user.578899.n2.nabble.com/
>
|