No, this is not the bug I was thinking of. Looks like your network connection is poor between the leader and the follower which the logs was attached. Do you have any other network monitoring tools in place or do you see any network related error messages in your kernel logs? Follower lost the connection to the leader: 2018-01-23 07:40:21,709 [myid:3] - WARN [SyncThread:3:SendAckRequestProcessor@64] - Closing connection to leader, exception during packet send ...and took ages to recover: 944 secs!! 2018-01-23 07:56:05,742 [myid:3] - INFO [QuorumPeer[myid=3]/XX.XX.XX:2181:Follower@63] - FOLLOWING - LEADER ELECTION TOOK - 944020 Additionally, a disk write has taken too long as well: 2018-01-23 07:40:21,706 [myid:3] - WARN [SyncThread:3:FileTxnLog@334] - fsync-ing the write ahead log in SyncThread:3 took 13638ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide I believe this stuff is worth to take a closer look, though I'm not an expert of Zookeeper, maybe somebody else can give you more insight. Regards, Andor On Wed, Jan 24, 2018 at 7:47 PM, upendar devu wrote: > Thanks Andor for the reply. > > We are using zookeeper version 3.4.6; we have 3 instances ; please see > below configuration , I believe we are using default configuration and > attached zk log and issue is occurred at First Occurrence: 01/23/2018 > 07:42:22 Last Occurrence: 01/23/2018 07:43:22 > > > The issue occurs 3 to 4 times in a month and get auto resolved in few mins > but this is really annoying our operations team. please let me know if you > need any additional details > > > > # The number of milliseconds of each tick > tickTime=2000 > > # The number of ticks that the initial synchronization phase can take > initLimit=10 > > # The number of ticks that can pass between sending a request and getting > an acknowledgement > syncLimit=5 > > # The directory where the snapshot is stored. > dataDir=/opt/zookeeper/current/data > > # The port at which the clients will connect > clientPort=2181 > > # This is the list of Zookeeper peers: > server.1=zookeeper1:2888:3888 > server.2=zookeeper2:2888:3888 > server.3=zookeeper3:2888:3888 > > # The interface IP address(es) from which zookeeper will listen from > clientPortAddress= > > # The number of snapshots to retain in dataDir > autopurge.snapRetainCount=3 > > # Purge task interval in hours > # Set to "0" to disable auto purge feature > autopurge.purgeInterval=1 > > > On Wed, Jan 24, 2018 at 4:51 AM, Andor Molnar wrote: > >> Hi Upendar, >> >> Thanks for reporting the issue. >> I've a gut feeling which existing bug you've run into, but would you >> please >> share some more detail (version of ZK, log context, config files, etc.) to >> get confidence? >> >> Thanks, >> Andor >> >> >> On Wed, Jan 17, 2018 at 4:36 PM, upendar devu >> wrote: >> >> > we are getting below error twice in a month , though its auto resolved >> but >> > anyone can explain why this error occurring and what needs to be done to >> > prevent the error , is this common error and can be ignored? >> > >> > Please suggest. >> > >> > >> > 2018-01-16 20:36:17,378 [myid:2] - WARN >> > [RecvWorker:3:QuorumCnxManager$RecvWorker@780] - Connection broken for >> id >> > 3, my id = 2, error = java.net.SocketException: Socket closed at >> > java.net.SocketInputStream.socketRead0(Native Method) at >> > java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at >> > java.net.SocketInputStream.read(SocketInputStream.java:171) at >> > java.net.SocketInputStream.read(SocketInputStream.java:141) at >> > java.net.SocketInputStream.read(SocketInputStream.java:224) at >> > java.io.DataInputStream.readInt(DataInputStream.java:387) at >> > org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run( >> > QuorumCnxManager.java:765) >> > >> > >