zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andor Molnar <an...@cloudera.com>
Subject Re: Unable to connect node to ensemble after restart of node zookeeper 3.4.6
Date Wed, 10 Jan 2018 09:05:20 GMT
Hi hkwan,

java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:197)
        at java.net.SocketInputStream.read(SocketInputStream.java:122)
        at java.net.SocketInputStream.read(SocketInputStream.java:211)
        at java.io.DataInputStream.readInt(DataInputStream.java:387)

This looks like a network issue to me.
Have you tried connecting a client from server 2 to the leader?

Regards,
Andor



On Wed, Jan 10, 2018 at 2:16 AM, hkwan <hkwan@centerfield.com> wrote:

> I have a 3 node ensemble in production and after restarting one node it can
> no longer connect to the ensemble.  I am getting this error below:
>
> 2018-01-10 00:49:32,492 [myid:2] - INFO
> [WorkerSender[myid=2]:QuorumCnxManager@193] - Have smaller server
> identifier, so dropping the connection: (3, 2)
> 2018-01-10 00:50:20,342 [myid:2] - WARN
> [RecvWorker:1:QuorumCnxManager$RecvWorker@780] - Connection broken for id
> 1,
> my id = 2, error =
> java.net.SocketException: Connection reset
>         at java.net.SocketInputStream.read(SocketInputStream.java:197)
>         at java.net.SocketInputStream.read(SocketInputStream.java:122)
>         at java.net.SocketInputStream.read(SocketInputStream.java:211)
>         at java.io.DataInputStream.readInt(DataInputStream.java:387)
>         at
> org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(
> QuorumCnxManager.java:765)
> 2018-01-10 00:50:20,343 [myid:2] - WARN
> [RecvWorker:1:QuorumCnxManager$RecvWorker@783] - Interrupting SendWorker
> 2018-01-10 00:50:20,343 [myid:2] - WARN
> [SendWorker:1:QuorumCnxManager$SendWorker@697] - Interrupted while waiting
> for message on queue
> java.lang.InterruptedException
>         at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.
> reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
>         at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$
> ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2095)
>         at
> java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:389)
>         at
> org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(
> QuorumCnxManager.java:849)
>         at
> org.apache.zookeeper.server.quorum.QuorumCnxManager.
> access$500(QuorumCnxManager.java:64)
>         at
> org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(
> QuorumCnxManager.java:685)
> 2018-01-10 00:50:20,343 [myid:2] - WARN
> [SendWorker:1:QuorumCnxManager$SendWorker@706] - Send worker leaving
> thread
> 2018-01-10 00:50:32,491 [myid:2] - INFO
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
> Notification time out: 60000
> 2018-01-10 00:50:32,493 [myid:2] - INFO
> [WorkerReceiver[myid=2]:FastLeaderElection@597] - Notification: 1 (message
> format version), 2 (n.leader), 0x707e3e9a9 (n.zxid), 0x1 (n.round), LOOKING
> (n.state), 2 (n.sid), 0x7 (n.peerEpoch) LOOKING (my state)
> 2018-01-10 00:50:32,495 [myid:2] - INFO
> [WorkerSender[myid=2]:QuorumCnxManager@193] - Have smaller server
> identifier, so dropping the connection: (3, 2)
> 2018-01-10 00:51:32,494 [myid:2] - INFO
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
> Notification time out: 60000
> 2018-01-10 00:51:32,494 [myid:2] - INFO
> [WorkerReceiver[myid=2]:FastLeaderElection@597] - Notification: 1 (message
> format version), 2 (n.leader), 0x707e3e9a9 (n.zxid), 0x1 (n.round), LOOKING
> (n.state), 2 (n.sid), 0x7 (n.peerEpoch) LOOKING (my state)
> 2018-01-10 00:51:32,496 [myid:2] - INFO
> [WorkerSender[myid=2]:QuorumCnxManager@193] - Have smaller server
> identifier, so dropping the connection: (3, 2)
> 2018-01-10 00:52:19,126 [myid:2] - WARN
> [RecvWorker:1:QuorumCnxManager$RecvWorker@780] - Connection broken for id
> 1,
> my id = 2, error =
> java.net.SocketException: Connection reset
>         at java.net.SocketInputStream.read(SocketInputStream.java:197)
>         at java.net.SocketInputStream.read(SocketInputStream.java:122)
>         at java.net.SocketInputStream.read(SocketInputStream.java:211)
>         at java.io.DataInputStream.readInt(DataInputStream.java:387)
>         at
> org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(
> QuorumCnxManager.java:765)
> 2018-01-10 00:52:19,127 [myid:2] - WARN
> [RecvWorker:1:QuorumCnxManager$RecvWorker@783] - Interrupting SendWorker
> 2018-01-10 00:52:19,127 [myid:2] - WARN
> [SendWorker:1:QuorumCnxManager$SendWorker@697] - Interrupted while waiting
> for message on queue
> java.lang.InterruptedException
>         at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.
> reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
>         at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$
> ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2095)
>         at
> java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:389)
>         at
> org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(
> QuorumCnxManager.java:849)
>         at
> org.apache.zookeeper.server.quorum.QuorumCnxManager.
> access$500(QuorumCnxManager.java:64)
>         at
> org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(
> QuorumCnxManager.java:685)
> 2018-01-10 00:52:19,128 [myid:2] - WARN
> [SendWorker:1:QuorumCnxManager$SendWorker@706] - Send worker leaving
> thread
> 2018-01-10 00:52:32,495 [myid:2] - INFO
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
> Notification time out: 60000
> 2018-01-10 00:52:32,497 [myid:2] - INFO
> [WorkerReceiver[myid=2]:FastLeaderElection@597] - Notification: 1 (message
> format version), 2 (n.leader), 0x707e3e9a9 (n.zxid), 0x1 (n.round), LOOKING
> (n.state), 2 (n.sid), 0x7 (n.peerEpoch) LOOKING (my state)
> 2018-01-10 00:52:32,499 [myid:2] - INFO
> [WorkerSender[myid=2]:QuorumCnxManager@193] - Have smaller server
> identifier, so dropping the connection: (3, 2)
>
>
> my configuration on all three servers are:
>
> clientPort=2181
> dataDir=/var/opt/zookeeper/data
> tickTime=2000
> autopurge.purgeInterval=24
> initLimit=10
> syncLimit=5
> server.1=10.1.0.122:2888:3888
> server.2=10.1.1.75:2888:3888
> server.3=10.1.2.221:2888:3888
>
> server 3 is currently leader
> server 1 is currently follower
> server 2 currently cannot rejoin the ensemble
>
> myid files are correctly configured for all three servers.  this is a
> production cluster so I would like to know if there was a way to force the
> node back into the cluster without anything drastic that would cause the
> quorum to be lost.
>
>
>
>
>
> --
> Sent from: http://zookeeper-user.578899.n2.nabble.com/
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message