zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: Zookeeper quorum goes down for no apparent reason in 3.4.5
Date Mon, 09 Oct 2017 17:48:44 GMT
According to the logs the reads are timing out:

java.net.SocketTimeoutException: Read timed out

this could be any number of things, networking/cpu/.... - resource
starvation. For example I mentioned GC previously, did you turn on GC
logging and rule that out?

The "leaderServes" option might also be useful, hard to say with such
limited insight into your environment. See
http://zookeeper.apache.org/doc/r3.4.10/zookeeperAdmin.html#sc_clusterOptions

Regards,

Patrick

On Thu, Oct 5, 2017 at 10:23 AM, Anand Parthasarathy <
anpartha@avinetworks.com> wrote:

> I will attach the logs inline this time. I looked at the metrics and can't
> find any issue wrt network connectivity or cpu/disk/memory issues. We also
> moved the data directory to /run memfs to avoid fsync latencies as we don't
> need persistency. Could you pls. take a look at the logs and see if you can
> spot something that gives an insight to the underlying issue.
>
> Thanks,
> Anand.
>
> *From node1:*
>
> 2017-09-21 02:57:13,164 [myid:1] - INFO
> [PurgeTask:DatadirCleanupManager$PurgeTask@138] - Purge task started.
> Removing file: Sep 21, 2017 1:17:24 AM
> /run/zookeeper/version-2/log.50006f026
> Removing file: Sep 21, 2017 12:47:17 AM
> /run/zookeeper/version-2/log.50006ecf6
> Removing file: Sep 21, 2017 1:17:24 AM
> /run/zookeeper/version-2/snapshot.50006f375
> Removing file: Sep 21, 2017 12:47:17 AM
> /run/zookeeper/version-2/snapshot.50006f024
> 2017-09-21 02:57:13,164 [myid:1] - INFO
> [PurgeTask:DatadirCleanupManager$PurgeTask@144] - Purge task completed.
> 2017-09-21 02:59:32,298 [myid:1] - INFO
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:Learner@108] - Revalidating
> client: 0x15e5304d29b0046
> 2017-09-21 02:59:34,271 [myid:1] - WARN  [QuorumPeer[myid=1]/127.0.0.1:
> 5002
> :Follower@89] - Exception when following the leader
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:153)
> at java.net.SocketInputStream.read(SocketInputStream.java:122)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
> at java.io.DataInputStream.readInt(DataInputStream.java:387)
> at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(
> QuorumPacket.java:83)
> at
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
> at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
> at
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
> at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
> 2017-09-21 02:59:34,271 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
> 5002
> :Follower@166] - shutdown called
> java.lang.Exception: shutdown Follower
> at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
> at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:790)
> 2017-09-21 02:59:34,272 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
> 5002
> :FollowerZooKeeperServer@139] - Shutting down
> 2017-09-21 02:59:34,272 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
> 5002
> :ZooKeeperServer@441] - shutting down
> 2017-09-21 02:59:34,272 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
> 5002
> :FollowerRequestProcessor@105] - Shutting down
> 2017-09-21 02:59:34,272 [myid:1] - INFO
> [FollowerRequestProcessor:1:FollowerRequestProcessor@95] -
> FollowerRequestProcessor exited loop!
> 2017-09-21 02:59:34,272 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
> 5002
> :CommitProcessor@181] - Shutting down
> 2017-09-21 02:59:34,272 [myid:1] - INFO
> [CommitProcessor:1:CommitProcessor@150] - CommitProcessor exited loop!
> 2017-09-21 02:59:34,272 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
> 5002
> :FinalRequestProcessor@415] - shutdown of request processor complete
> 2017-09-21 02:59:34,272 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
> 5002
> :SyncRequestProcessor@209] - Shutting down
> 2017-09-21 02:59:34,272 [myid:1] - INFO
> [SyncThread:1:SyncRequestProcessor@187] - SyncRequestProcessor exited!
> 2017-09-21 02:59:34,272 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
> 5002
> :QuorumPeer@714] - LOOKING
> 2017-09-21 02:59:34,273 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
> 5002
> :FileSnap@83] - Reading snapshot /run/zookeeper/version-2/snaps
> hot.50006fba3
> 2017-09-21 02:59:34,275 [myid:1] - WARN
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:NIOServerCnxn@362] -
> Exception causing close of session 0x0 due to java.io.IOException:
> ZooKeeperServer not running
> 2017-09-21 02:59:34,275 [myid:1] - WARN
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:NIOServerCnxn@362] -
> Exception causing close of session 0x0 due to java.io.IOException:
> ZooKeeperServer not running
> 2017-09-21 02:59:34,277 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
> 5002
> :FastLeaderElection@815] - New election. My id =  1, proposed
> zxid=0x50006fec3
> 2017-09-21 02:59:34,277 [myid:1] - INFO
> [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message
> format version), 1 (n.leader), 0x50006fec3 (n.zxid), 0xd (n.round), LOOKING
> (n.state), 1 (n.sid), 0x5 (n.peerEpoch) LOOKING (my state)
> 2017-09-21 02:59:34,278 [myid:1] - INFO
> [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message
> format version), 2 (n.leader), 0x50006fec3 (n.zxid), 0xd (n.round), LOOKING
> (n.state), 2 (n.sid), 0x5 (n.peerEpoch) LOOKING (my state)
> 2017-09-21 02:59:34,278 [myid:1] - INFO
> [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message
> format version), 2 (n.leader), 0x50006fec3 (n.zxid), 0xd (n.round), LOOKING
> (n.state), 1 (n.sid), 0x5 (n.peerEpoch) LOOKING (my state)
> 2017-09-21 02:59:34,478 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
> 5002
> :QuorumPeer@784] - FOLLOWING
> 2017-09-21 02:59:34,478 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
> 5002
> :ZooKeeperServer@162] - Created server with tickTime 2000
> minSessionTimeout
> 4000 maxSessionTimeout 40000 datadir /run/zookeeper/version-2 snapdir
> /run/zookeeper/version-2
> 2017-09-21 02:59:34,478 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
> 5002
> :Follower@63] - FOLLOWING - LEADER ELECTION TOOK - 205
> 2017-09-21 02:59:34,481 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
> 5002
> :Learner@323] - Getting a diff from the leader 0x50006fec3
> 2017-09-21 02:59:34,520 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
> 5002
> :FileTxnSnapLog@240] - Snapshotting: 0x50006fec3 to
> /run/zookeeper/version-2/snapshot.50006fec3
> 2017-09-21 02:59:34,876 [myid:1] - INFO
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:Learner@108] - Revalidating
> client: 0x15e5d10a4b50020
> 2017-09-21 02:59:34,944 [myid:1] - WARN  [QuorumPeer[myid=1]/127.0.0.1:
> 5002
> :Follower@118] - Got zxid 0x600000001 expected 0x1
> 2017-09-21 02:59:35,495 [myid:1] - INFO
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:Learner@108] - Revalidating
> client: 0x15e901612194a6f
> 2017-09-21 03:00:02,649 [myid:1] - INFO
> [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message
> format version), 3 (n.leader), 0x40000e237 (n.zxid), 0xc (n.round), LEADING
> (n.state), 3 (n.sid), 0x5 (n.peerEpoch) FOLLOWING (my state)
> 2017-09-21 03:00:02,650 [myid:1] - INFO
> [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message
> format version), 3 (n.leader), 0x40000e237 (n.zxid), 0xc (n.round), LEADING
> (n.state), 3 (n.sid), 0x5 (n.peerEpoch) FOLLOWING (my state)
> 2017-09-21 03:00:03,651 [myid:1] - INFO
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:Learner@108] - Revalidating
> client: 0x15e5304d29b0046
> 2017-09-21 03:00:03,651 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
> 5002
> :ZooKeeperServer@610] - Invalid session 0x15e5304d29b0046 for client /
> 127.0.0.1:57028, probably expired
> 2017-09-21 03:00:03,660 [myid:1] - INFO
> [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message
> format version), 3 (n.leader), 0x50006fed5 (n.zxid), 0xd (n.round), LOOKING
> (n.state), 3 (n.sid), 0x5 (n.peerEpoch) FOLLOWING (my state)
> 2017-09-21 03:00:04,748 [myid:1] - WARN
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:NIOServerCnxn@357] - caught
> end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x15ea2604fd1000e, likely client has closed socket
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServ
> erCnxnFactory.java:208)
> at java.lang.Thread.run(Thread.java:745)
> 2017-09-21 03:00:11,624 [myid:1] - WARN
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:NIOServerCnxn@357] - caught
> end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x15ea2604fd10001, likely client has closed socket
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServ
> erCnxnFactory.java:208)
> at java.lang.Thread.run(Thread.java:745)
> 2017-09-21 03:00:11,847 [myid:1] - INFO  [SyncThread:1:FileTxnLog@199] -
> Creating new log file: log.6000003ac
> 2017-09-21 03:00:11,847 [myid:1] - INFO  [Snapshot
> Thread:FileTxnSnapLog@240]
> - Snapshotting: 0x6000003ab to /run/zookeeper/version-2/snapshot.6000003ab
> 2017-09-21 03:00:34,482 [myid:1] - WARN
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:NIOServerCnxn@357] - caught
> end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x15ea2604fd1006a, likely client has closed socket
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServ
> erCnxnFactory.java:208)
> at java.lang.Thread.run(Thread.java:745)
> 2017-09-21 03:00:34,536 [myid:1] - WARN
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:NIOServerCnxn@357] - caught
> end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x15ea2604fd1006c, likely client has closed socket
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServ
> erCnxnFactory.java:208)
> at java.lang.Thread.run(Thread.java:745)
> 2017-09-21 03:00:34,645 [myid:1] - WARN
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:NIOServerCnxn@357] - caught
> end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x15ea2604fd1006e, likely client has closed socket
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServ
> erCnxnFactory.java:208)
> at java.lang.Thread.run(Thread.java:745)
> 2017-09-21 03:00:36,104 [myid:1] - INFO  [Snapshot
> Thread:FileTxnSnapLog@240]
> - Snapshotting: 0x60000072a to /run/zookeeper/version-2/snapshot.60000072a
> 2017-09-21 03:00:36,138 [myid:1] - INFO  [SyncThread:1:FileTxnLog@199] -
> Creating new log file: log.60000072c
> 2017-09-21 03:00:36,558 [myid:1] - WARN
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:NIOServerCnxn@357] - caught
> end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x15ea2604fd1007c, likely client has closed socket
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServ
> erCnxnFactory.java:208)
> at java.lang.Thread.run(Thread.java:745)
> 2017-09-21 03:00:37,770 [myid:1] - WARN
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:NIOServerCnxn@357] - caught
> end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x15ea2604fd10089, likely client has closed socket
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServ
> erCnxnFactory.java:208)
> at java.lang.Thread.run(Thread.java:745)
>
> *From node2:*
>
> 2017-09-21 02:56:34,210 [myid:2] - INFO  [Snapshot
> Thread:FileTxnSnapLog@240]
> - Snapshotting: 0x50006fe73 to /run/zookeeper/version-2/snapshot.50006fe73
> 2017-09-21 02:56:34,212 [myid:2] - INFO  [SyncThread:2:FileTxnLog@199] -
> Creating new log file: log.50006fe75
> 2017-09-21 02:57:15,736 [myid:2] - INFO
> [PurgeTask:DatadirCleanupManager$PurgeTask@138] - Purge task started.
> Removing file: Sep 21, 2017 1:39:07 AM
> /run/zookeeper/version-2/log.50006f22d
> Removing file: Sep 21, 2017 1:05:52 AM
> /run/zookeeper/version-2/log.50006f01d
> Removing file: Sep 21, 2017 12:47:06 AM
> /run/zookeeper/version-2/log.50006ec55
> Removing file: Sep 21, 2017 1:39:07 AM
> /run/zookeeper/version-2/snapshot.50006f5df
> Removing file: Sep 21, 2017 1:05:52 AM
> /run/zookeeper/version-2/snapshot.50006f22d
> Removing file: Sep 21, 2017 12:47:06 AM
> /run/zookeeper/version-2/snapshot.50006f01b
> 2017-09-21 02:57:15,737 [myid:2] - INFO
> [PurgeTask:DatadirCleanupManager$PurgeTask@144] - Purge task completed.
> 2017-09-21 02:59:31,513 [myid:2] - INFO
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:Learner@108] - Revalidating
> client: 0x15e5d10a4b50020
> 2017-09-21 02:59:31,748 [myid:2] - WARN
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:NIOServerCnxn@357] - caught
> end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x25e8d4776a2557c, likely client has closed socket
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServ
> erCnxnFactory.java:208)
> at java.lang.Thread.run(Thread.java:745)
> 2017-09-21 02:59:32,350 [myid:2] - INFO
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:Learner@108] - Revalidating
> client: 0x25e8d4776a2557c
> 2017-09-21 02:59:33,221 [myid:2] - INFO
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:Learner@108] - Revalidating
> client: 0x25e5304d29d003c
> 2017-09-21 02:59:33,393 [myid:2] - INFO
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:Learner@108] - Revalidating
> client: 0x25e5304d29d003b
> 2017-09-21 02:59:34,272 [myid:2] - WARN  [QuorumPeer[myid=2]/127.0.0.1:
> 5002
> :Follower@89] - Exception when following the leader
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:153)
> at java.net.SocketInputStream.read(SocketInputStream.java:122)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
> at java.io.DataInputStream.readInt(DataInputStream.java:387)
> at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(
> QuorumPacket.java:83)
> at
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
> at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
> at
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
> at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
> 2017-09-21 02:59:34,272 [myid:2] - INFO  [QuorumPeer[myid=2]/127.0.0.1:
> 5002
> :Follower@166] - shutdown called
> java.lang.Exception: shutdown Follower
> at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
> at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:790)
> 2017-09-21 02:59:34,273 [myid:2] - INFO  [QuorumPeer[myid=2]/127.0.0.1:
> 5002
> :FollowerZooKeeperServer@139] - Shutting down
> 2017-09-21 02:59:34,273 [myid:2] - INFO  [QuorumPeer[myid=2]/127.0.0.1:
> 5002
> :ZooKeeperServer@441] - shutting down
> 2017-09-21 02:59:34,273 [myid:2] - INFO  [QuorumPeer[myid=2]/127.0.0.1:
> 5002
> :FollowerRequestProcessor@105] - Shutting down
> 2017-09-21 02:59:34,273 [myid:2] - INFO
> [FollowerRequestProcessor:2:FollowerRequestProcessor@95] -
> FollowerRequestProcessor exited loop!
> 2017-09-21 02:59:34,273 [myid:2] - INFO  [QuorumPeer[myid=2]/127.0.0.1:
> 5002
> :CommitProcessor@181] - Shutting down
> 2017-09-21 02:59:34,273 [myid:2] - INFO  [QuorumPeer[myid=2]/127.0.0.1:
> 5002
> :FinalRequestProcessor@415] - shutdown of request processor complete
> 2017-09-21 02:59:34,274 [myid:2] - INFO
> [CommitProcessor:2:CommitProcessor@150] - CommitProcessor exited loop!
> 2017-09-21 02:59:34,274 [myid:2] - INFO  [QuorumPeer[myid=2]/127.0.0.1:
> 5002
> :SyncRequestProcessor@209] - Shutting down
> 2017-09-21 02:59:34,274 [myid:2] - INFO
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:ZooKeeperServer@841] -
> Refusing session request for client /127.0.0.1:59500 as it has seen zxid
> 0x50006febd our last zxid is 0x0 client must try another server
> 2017-09-21 02:59:34,274 [myid:2] - INFO
> [SyncThread:2:SyncRequestProcessor@187] - SyncRequestProcessor exited!
> 2017-09-21 02:59:34,274 [myid:2] - INFO  [QuorumPeer[myid=2]/127.0.0.1:
> 5002
> :QuorumPeer@714] - LOOKING
> 2017-09-21 02:59:34,274 [myid:2] - INFO  [QuorumPeer[myid=2]/127.0.0.1:
> 5002
> :FileSnap@83] - Reading snapshot /run/zookeeper/version-2/snaps
> hot.50006fe73
> 2017-09-21 02:59:34,276 [myid:2] - INFO  [QuorumPeer[myid=2]/127.0.0.1:
> 5002
> :FastLeaderElection@815] - New election. My id =  2, proposed
> zxid=0x50006fec3
> 2017-09-21 02:59:34,277 [myid:2] - INFO
> [WorkerReceiver[myid=2]:FastLeaderElection@597] - Notification: 1 (message
> format version), 2 (n.leader), 0x50006fec3 (n.zxid), 0xd (n.round), LOOKING
> (n.state), 2 (n.sid), 0x5 (n.peerEpoch) LOOKING (my state)
> 2017-09-21 02:59:34,277 [myid:2] - INFO
> [WorkerReceiver[myid=2]:FastLeaderElection@597] - Notification: 1 (message
> format version), 1 (n.leader), 0x50006fec3 (n.zxid), 0xd (n.round), LOOKING
> (n.state), 1 (n.sid), 0x5 (n.peerEpoch) LOOKING (my state)
> 2017-09-21 02:59:34,277 [myid:2] - INFO
> [WorkerReceiver[myid=2]:FastLeaderElection@597] - Notification: 1 (message
> format version), 2 (n.leader), 0x50006fec3 (n.zxid), 0xd (n.round), LOOKING
> (n.state), 1 (n.sid), 0x5 (n.peerEpoch) LOOKING (my state)
> 2017-09-21 02:59:34,404 [myid:2] - WARN
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:NIOServerCnxn@362] -
> Exception causing close of session 0x0 due to java.io.IOException:
> ZooKeeperServer not running
> 2017-09-21 02:59:34,478 [myid:2] - INFO  [QuorumPeer[myid=2]/127.0.0.1:
> 5002
> :QuorumPeer@796] - LEADING
> 2017-09-21 02:59:34,478 [myid:2] - INFO  [QuorumPeer[myid=2]/127.0.0.1:
> 5002
> :ZooKeeperServer@162] - Created server with tickTime 2000
> minSessionTimeout
> 4000 maxSessionTimeout 40000 datadir /run/zookeeper/version-2 snapdir
> /run/zookeeper/version-2
> 2017-09-21 02:59:34,478 [myid:2] - INFO  [QuorumPeer[myid=2]/127.0.0.1:
> 5002
> :Leader@358] - LEADING - LEADER ELECTION TOOK - 204
> 2017-09-21 02:59:34,479 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:39016
> :LearnerHandler@330] - Follower sid: 1 : info :
> org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@988c127
> 2017-09-21 02:59:34,479 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:39016
> :LearnerHandler@385] - Synchronizing with Follower sid: 1
> maxCommittedLog=0x50006fec3 minCommittedLog=0x50006fe74
> peerLastZxid=0x50006fec3
> 2017-09-21 02:59:34,480 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:39016
> :LearnerHandler@462] - Sending DIFF
> 2017-09-21 02:59:34,520 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:39016
> :LearnerHandler@522] - Received NEWLEADER-ACK message from 1
> 2017-09-21 02:59:34,520 [myid:2] - INFO  [QuorumPeer[myid=2]/127.0.0.1:
> 5002
> :Leader@943] - Have quorum of supporters, sids: [ 1,2 ]; starting up and
> setting last processed zxid: 0x600000000
> 2017-09-21 02:59:40,000 [myid:2] - INFO  [SessionTracker:ZooKeeperServe
> r@347]
> - Expiring session 0x35e68954661a833, timeout of 5000ms exceeded
> 2017-09-21 02:59:40,000 [myid:2] - INFO  [SessionTracker:ZooKeeperServe
> r@347]
> - Expiring session 0x25e8d4776a2557c, timeout of 5000ms exceeded
> 2017-09-21 02:59:56,001 [myid:2] - INFO  [SessionTracker:ZooKeeperServe
> r@347]
> - Expiring session 0x15e5304d29b0046, timeout of 20000ms exceeded
> 2017-09-21 02:59:56,001 [myid:2] - INFO  [SessionTracker:ZooKeeperServe
> r@347]
> - Expiring session 0x25e5304d29d003b, timeout of 20000ms exceeded
> 2017-09-21 02:59:56,001 [myid:2] - INFO  [SessionTracker:ZooKeeperServe
> r@347]
> - Expiring session 0x25e5304d29d003c, timeout of 20000ms exceeded
> 2017-09-21 02:59:56,001 [myid:2] - INFO  [SessionTracker:ZooKeeperServe
> r@347]
> - Expiring session 0x15e5d10a4b50046, timeout of 20000ms exceeded
> 2017-09-21 03:00:02,649 [myid:2] - INFO
> [WorkerReceiver[myid=2]:FastLeaderElection@597] - Notification: 1 (message
> format version), 3 (n.leader), 0x40000e237 (n.zxid), 0xc (n.round), LEADING
> (n.state), 3 (n.sid), 0x5 (n.peerEpoch) LEADING (my state)
> 2017-09-21 03:00:03,659 [myid:2] - INFO
> [WorkerReceiver[myid=2]:FastLeaderElection@597] - Notification: 1 (message
> format version), 3 (n.leader), 0x50006fed5 (n.zxid), 0xd (n.round), LOOKING
> (n.state), 3 (n.sid), 0x5 (n.peerEpoch) LEADING (my state)
> 2017-09-21 03:00:03,698 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:39060
> :LearnerHandler@330] - Follower sid: 3 : info :
> org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@3753377a
> 2017-09-21 03:00:03,699 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:39060
> :LearnerHandler@385] - Synchronizing with Follower sid: 3
> maxCommittedLog=0x600000086 minCommittedLog=0x50006fe74
> peerLastZxid=0x50006fed5
> 2017-09-21 03:00:03,699 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:39060
> :LearnerHandler@462] - Sending TRUNC
> 2017-09-21 03:00:03,709 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:39060
> :LearnerHandler@522] - Received NEWLEADER-ACK message from 3
> 2017-09-21 03:00:03,978 [myid:2] - WARN
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:NIOServerCnxn@357] - caught
> end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x25ea2604ff80006, likely client has closed socket
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServ
> erCnxnFactory.java:208)
> at java.lang.Thread.run(Thread.java:745)
> 2017-09-21 03:00:07,354 [myid:2] - WARN
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:NIOServerCnxn@357] - caught
> end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x25ea2604ff80025, likely client has closed socket
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServ
> erCnxnFactory.java:208)
> at java.lang.Thread.run(Thread.java:745)
> 2017-09-21 03:00:08,928 [myid:2] - INFO  [Snapshot
> Thread:FileTxnSnapLog@240]
> - Snapshotting: 0x60000026c to /run/zookeeper/version-2/snapshot.60000026c
> 2017-09-21 03:00:08,929 [myid:2] - INFO  [SyncThread:2:FileTxnLog@199] -
> Creating new log file: log.60000026e
> 2017-09-21 03:00:22,000 [myid:2] - INFO  [SessionTracker:ZooKeeperServe
> r@347]
> - Expiring session 0x35ea260c1fb002f, timeout of 10000ms exceeded
> 2017-09-21 03:00:23,900 [myid:2] - INFO  [Snapshot
> Thread:FileTxnSnapLog@240]
> - Snapshotting: 0x600000489 to /run/zookeeper/version-2/snapshot.600000489
> 2017-09-21 03:00:23,914 [myid:2] - INFO  [SyncThread:2:FileTxnLog@199] -
> Creating new log file: log.60000048b
>
>
> *From node3:*
>
> 2017-09-21 02:52:22,913 [myid:3] - WARN
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:NIOServerCnxn@357] - caught
> end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x35e68954661a822, likely client has closed socket
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServ
> erCnxnFactory.java:208)
> at java.lang.Thread.run(Thread.java:745)
> 2017-09-21 02:55:23,499 [myid:3] - WARN
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:NIOServerCnxn@357] - caught
> end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x35e68954661a836, likely client has closed socket
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServ
> erCnxnFactory.java:208)
> at java.lang.Thread.run(Thread.java:745)
> 2017-09-21 02:57:17,081 [myid:3] - INFO
> [PurgeTask:DatadirCleanupManager$PurgeTask@138] - Purge task started.
> Removing file: Sep 21, 2017 1:34:26 AM
> /run/zookeeper/version-2/log.50006f20a
> Removing file: Sep 21, 2017 1:04:15 AM
> /run/zookeeper/version-2/log.50006effd
> Removing file: Sep 21, 2017 12:46:00 AM
> /run/zookeeper/version-2/log.50006ec6c
> Removing file: Sep 21, 2017 1:34:26 AM
> /run/zookeeper/version-2/snapshot.50006f55b
> Removing file: Sep 21, 2017 1:04:15 AM
> /run/zookeeper/version-2/snapshot.50006f208
> Removing file: Sep 21, 2017 12:46:00 AM
> /run/zookeeper/version-2/snapshot.50006effb
> 2017-09-21 02:57:17,082 [myid:3] - INFO
> [PurgeTask:DatadirCleanupManager$PurgeTask@144] - Purge task completed.
> 2017-09-21 02:59:24,260 [myid:3] - WARN
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:NIOServerCnxn@357] - caught
> end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x35e68954661a852, likely client has closed socket
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServ
> erCnxnFactory.java:208)
> at java.lang.Thread.run(Thread.java:745)
> 2017-09-21 03:00:02,644 [myid:3] - ERROR [LearnerHandler-/127.0.0.1:40958
> :LearnerHandler@633] - Unexpected exception causing shutdown while sock
> still open
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:392)
> at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(
> QuorumPacket.java:83)
> at
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
> at
> org.apache.zookeeper.server.quorum.LearnerHandler.run(Learne
> rHandler.java:546)
> 2017-09-21 03:00:02,645 [myid:3] - WARN  [LearnerHandler-/127.0.0.1:40958
> :LearnerHandler@646] - ******* GOODBYE /127.0.0.1:40958 ********
> 2017-09-21 03:00:02,645 [myid:3] - INFO  [SessionTracker:ZooKeeperServe
> r@347]
> - Expiring session 0x15e901612194a6f, timeout of 5000ms exceeded
> 2017-09-21 03:00:02,645 [myid:3] - INFO  [SessionTracker:ZooKeeperServe
> r@347]
> - Expiring session 0x25e8d4776a2557c, timeout of 5000ms exceeded
> 2017-09-21 03:00:02,645 [myid:3] - INFO  [SessionTracker:ZooKeeperServe
> r@347]
> - Expiring session 0x35e68954661a833, timeout of 5000ms exceeded
> 2017-09-21 03:00:02,645 [myid:3] - INFO  [SessionTracker:ZooKeeperServe
> r@347]
> - Expiring session 0x25e5304d29d003c, timeout of 20000ms exceeded
> 2017-09-21 03:00:02,645 [myid:3] - INFO  [SessionTracker:ZooKeeperServe
> r@347]
> - Expiring session 0x15e5d10a4b50020, timeout of 20000ms exceeded
> 2017-09-21 03:00:02,645 [myid:3] - INFO  [SessionTracker:ZooKeeperServe
> r@347]
> - Expiring session 0x25e5304d29d003b, timeout of 20000ms exceeded
> 2017-09-21 03:00:02,645 [myid:3] - INFO  [SessionTracker:ZooKeeperServe
> r@347]
> - Expiring session 0x15e5d10a4b50046, timeout of 20000ms exceeded
> 2017-09-21 03:00:02,645 [myid:3] - INFO  [SessionTracker:ZooKeeperServe
> r@347]
> - Expiring session 0x15e5d10a4b50022, timeout of 20000ms exceeded
> 2017-09-21 03:00:02,645 [myid:3] - INFO  [SessionTracker:ZooKeeperServe
> r@347]
> - Expiring session 0x35e5d10a4de0029, timeout of 20000ms exceeded
> 2017-09-21 03:00:02,645 [myid:3] - INFO  [SessionTracker:ZooKeeperServe
> r@347]
> - Expiring session 0x15e5d10a4b50021, timeout of 20000ms exceeded
> 2017-09-21 03:00:02,645 [myid:3] - INFO  [SessionTracker:ZooKeeperServe
> r@347]
> - Expiring session 0x35e5d10a4de002a, timeout of 20000ms exceeded
> 2017-09-21 03:00:02,648 [myid:3] - INFO
> [WorkerReceiver[myid=3]:FastLeaderElection@597] - Notification: 1 (message
> format version), 2 (n.leader), 0x50006fec3 (n.zxid), 0xd (n.round), LOOKING
> (n.state), 2 (n.sid), 0x5 (n.peerEpoch) LEADING (my state)
> 2017-09-21 03:00:02,648 [myid:3] - INFO
> [WorkerReceiver[myid=3]:FastLeaderElection@597] - Notification: 1 (message
> format version), 1 (n.leader), 0x50006fec3 (n.zxid), 0xd (n.round), LOOKING
> (n.state), 1 (n.sid), 0x5 (n.peerEpoch) LEADING (my state)
> 2017-09-21 03:00:02,648 [myid:3] - INFO
> [WorkerReceiver[myid=3]:FastLeaderElection@597] - Notification: 1 (message
> format version), 2 (n.leader), 0x50006fec3 (n.zxid), 0xd (n.round), LOOKING
> (n.state), 1 (n.sid), 0x5 (n.peerEpoch) LEADING (my state)
> 2017-09-21 03:00:02,649 [myid:3] - ERROR [LearnerHandler-/127.0.0.1:58432
> :LearnerHandler@633] - Unexpected exception causing shutdown while sock
> still open
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:392)
> at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(
> QuorumPacket.java:83)
> at
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
> at
> org.apache.zookeeper.server.quorum.LearnerHandler.run(Learne
> rHandler.java:546)
> 2017-09-21 03:00:02,649 [myid:3] - WARN  [LearnerHandler-/127.0.0.1:58432
> :LearnerHandler@646] - ******* GOODBYE /127.0.0.1:58432 ********
> 2017-09-21 03:00:02,651 [myid:3] - INFO  [ProcessThread(sid:3
> cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when
> processing sessionid:0x15e5304d29b0046 type:ping cxid:0xfffffffffffffffe
> zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:null
> Error:KeeperErrorCode = Session moved
> 2017-09-21 03:00:02,651 [myid:3] - WARN
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:NIOServerCnxn@357] - caught
> end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x15e5d10a4b50020, likely client has closed socket
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServ
> erCnxnFactory.java:208)
> at java.lang.Thread.run(Thread.java:745)
> 2017-09-21 03:00:02,651 [myid:3] - WARN
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:NIOServerCnxn@357] - caught
> end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x25e5304d29d003b, likely client has closed socket
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServ
> erCnxnFactory.java:208)
> at java.lang.Thread.run(Thread.java:745)
> 2017-09-21 03:00:02,651 [myid:3] - WARN
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:NIOServerCnxn@357] - caught
> end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x35e68954661a833, likely client has closed socket
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServ
> erCnxnFactory.java:208)
> at java.lang.Thread.run(Thread.java:745)
> 2017-09-21 03:00:02,652 [myid:3] - WARN
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:NIOServerCnxn@357] - caught
> end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x25e5304d29d003c, likely client has closed socket
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServ
> erCnxnFactory.java:208)
> at java.lang.Thread.run(Thread.java:745)
> 2017-09-21 03:00:02,652 [myid:3] - WARN
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:NIOServerCnxn@357] - caught
> end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x15e5304d29b0046, likely client has closed socket
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServ
> erCnxnFactory.java:208)
> at java.lang.Thread.run(Thread.java:745)
> 2017-09-21 03:00:02,652 [myid:3] - WARN
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:NIOServerCnxn@357] - caught
> end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x35e5d10a4de0029, likely client has closed socket
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServ
> erCnxnFactory.java:208)
> at java.lang.Thread.run(Thread.java:745)
> 2017-09-21 03:00:02,653 [myid:3] - WARN
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:NIOServerCnxn@357] - caught
> end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x15e5d10a4b50022, likely client has closed socket
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServ
> erCnxnFactory.java:208)
> at java.lang.Thread.run(Thread.java:745)
> 2017-09-21 03:00:02,653 [myid:3] - WARN
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:NIOServerCnxn@357] - caught
> end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x35e5d10a4de002a, likely client has closed socket
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServ
> erCnxnFactory.java:208)
> at java.lang.Thread.run(Thread.java:745)
> 2017-09-21 03:00:02,653 [myid:3] - INFO
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:ZooKeeperServer@610] -
> Invalid session 0x35e68954661a833 for client /127.0.0.1:46968, probably
> expired
> 2017-09-21 03:00:02,653 [myid:3] - WARN
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:NIOServerCnxn@357] - caught
> end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x15e5d10a4b50021, likely client has closed socket
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServ
> erCnxnFactory.java:208)
> at java.lang.Thread.run(Thread.java:745)
> 2017-09-21 03:00:02,654 [myid:3] - INFO
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:ZooKeeperServer@610] -
> Invalid session 0x25e5304d29d003c for client /127.0.0.1:46982, probably
> expired
> 2017-09-21 03:00:02,654 [myid:3] - INFO
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:ZooKeeperServer@610] -
> Invalid session 0x15e5d10a4b50046 for client /127.0.0.1:46984, probably
> expired
> 2017-09-21 03:00:02,660 [myid:3] - INFO
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:ZooKeeperServer@610] -
> Invalid session 0x25e5304d29d003b for client /127.0.0.1:46994, probably
> expired
> 2017-09-21 03:00:03,648 [myid:3] - INFO  [QuorumPeer[myid=3]/127.0.0.1:
> 5002
> :Leader@493] - Shutting down
> 2017-09-21 03:00:03,648 [myid:3] - INFO  [QuorumPeer[myid=3]/127.0.0.1:
> 5002
> :Leader@499] - Shutdown called
> java.lang.Exception: shutdown Leader! reason: Not sufficient followers
> synced, only synced with sids: [ 3 ]
> at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:499)
> at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:474)
> at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:799)
> 2017-09-21 03:00:03,648 [myid:3] - INFO
> [Thread-17:Leader$LearnerCnxAcceptor@322] - exception while shutting down
> acceptor: java.net.SocketException: Socket closed
> 2017-09-21 03:00:03,648 [myid:3] - INFO  [QuorumPeer[myid=3]/127.0.0.1:
> 5002
> :ZooKeeperServer@441] - shutting down
> 2017-09-21 03:00:03,649 [myid:3] - INFO  [QuorumPeer[myid=3]/127.0.0.1:
> 5002
> :SessionTrackerImpl@225] - Shutting down
> 2017-09-21 03:00:03,649 [myid:3] - INFO  [QuorumPeer[myid=3]/127.0.0.1:
> 5002
> :PrepRequestProcessor@761] - Shutting down
> 2017-09-21 03:00:03,649 [myid:3] - INFO  [QuorumPeer[myid=3]/127.0.0.1:
> 5002
> :ProposalRequestProcessor@88] - Shutting down
> 2017-09-21 03:00:03,649 [myid:3] - INFO  [QuorumPeer[myid=3]/127.0.0.1:
> 5002
> :CommitProcessor@181] - Shutting down
> 2017-09-21 03:00:03,649 [myid:3] - INFO  [QuorumPeer[myid=3]/127.0.0.1:
> 5002
> :Leader$ToBeAppliedRequestProcessor@658] - Shutting down
> 2017-09-21 03:00:03,649 [myid:3] - INFO  [QuorumPeer[myid=3]/127.0.0.1:
> 5002
> :FinalRequestProcessor@415] - shutdown of request processor complete
> 2017-09-21 03:00:03,649 [myid:3] - INFO  [QuorumPeer[myid=3]/127.0.0.1:
> 5002
> :SyncRequestProcessor@209] - Shutting down
> 2017-09-21 03:00:03,649 [myid:3] - INFO  [ProcessThread(sid:3
> cport:-1)::PrepRequestProcessor@143] - PrepRequestProcessor exited loop!
> 2017-09-21 03:00:03,649 [myid:3] - INFO
> [CommitProcessor:3:CommitProcessor@150] - CommitProcessor exited loop!
> 2017-09-21 03:00:03,649 [myid:3] - INFO
> [SyncThread:3:SyncRequestProcessor@187] - SyncRequestProcessor exited!
> 2017-09-21 03:00:03,652 [myid:3] - INFO  [QuorumPeer[myid=3]/127.0.0.1:
> 5002
> :QuorumPeer@714] - LOOKING
> 2017-09-21 03:00:03,653 [myid:3] - INFO  [QuorumPeer[myid=3]/127.0.0.1:
> 5002
> :FileSnap@83] - Reading snapshot /run/zookeeper/version-2/snaps
> hot.50006fcf2
> 2017-09-21 03:00:03,659 [myid:3] - INFO  [QuorumPeer[myid=3]/127.0.0.1:
> 5002
> :FastLeaderElection@815] - New election. My id =  3, proposed
> zxid=0x50006fed5
> 2017-09-21 03:00:03,659 [myid:3] - INFO
> [WorkerReceiver[myid=3]:FastLeaderElection@597] - Notification: 1 (message
> format version), 3 (n.leader), 0x50006fed5 (n.zxid), 0xd (n.round), LOOKING
> (n.state), 3 (n.sid), 0x5 (n.peerEpoch) LOOKING (my state)
> 2017-09-21 03:00:03,660 [myid:3] - INFO
> [WorkerReceiver[myid=3]:FastLeaderElection@597] - Notification: 1 (message
> format version), 2 (n.leader), 0x50006fec3 (n.zxid), 0xd (n.round), LEADING
> (n.state), 2 (n.sid), 0x6 (n.peerEpoch) LOOKING (my state)
> 2017-09-21 03:00:03,697 [myid:3] - INFO
> [WorkerReceiver[myid=3]:FastLeaderElection@597] - Notification: 1 (message
> format version), 2 (n.leader), 0x50006fec3 (n.zxid), 0xd (n.round),
> FOLLOWING (n.state), 1 (n.sid), 0x6 (n.peerEpoch) LOOKING (my state)
> 2017-09-21 03:00:03,697 [myid:3] - INFO  [QuorumPeer[myid=3]/127.0.0.1:
> 5002
> :QuorumPeer@784] - FOLLOWING
> 2017-09-21 03:00:03,697 [myid:3] - INFO  [QuorumPeer[myid=3]/127.0.0.1:
> 5002
> :ZooKeeperServer@162] - Created server with tickTime 2000
> minSessionTimeout
> 4000 maxSessionTimeout 40000 datadir /run/zookeeper/version-2 snapdir
> /run/zookeeper/version-2
> 2017-09-21 03:00:03,697 [myid:3] - INFO  [QuorumPeer[myid=3]/127.0.0.1:
> 5002
> :Follower@63] - FOLLOWING - LEADER ELECTION TOOK - 44
> 2017-09-21 03:00:03,700 [myid:3] - WARN  [QuorumPeer[myid=3]/127.0.0.1:
> 5002
> :Learner@338] - Truncating log to get in sync with the leader 0x50006fec3
> 2017-09-21 03:00:03,702 [myid:3] - INFO  [QuorumPeer[myid=3]/127.0.0.1:
> 5002
> :FileSnap@83] - Reading snapshot /run/zookeeper/version-2/snaps
> hot.50006fcf2
> 2017-09-21 03:00:03,707 [myid:3] - WARN  [QuorumPeer[myid=3]/127.0.0.1:
> 5002
> :Learner@374] - Got zxid 0x600000001 expected 0x1
> 2017-09-21 03:00:03,708 [myid:3] - INFO  [QuorumPeer[myid=3]/127.0.0.1:
> 5002
> :FileTxnSnapLog@240] - Snapshotting: 0x600000086 to
> /run/zookeeper/version-2/snapshot.600000086
> 2017-09-21 03:00:03,773 [myid:3] - INFO  [SyncThread:3:FileTxnLog@199] -
> Creating new log file: log.600000087
> 2017-09-21 03:00:03,776 [myid:3] - WARN  [QuorumPeer[myid=3]/127.0.0.1:
> 5002
> :Follower@118] - Got zxid 0x600000088 expected 0x1
> 2017-09-21 03:00:03,895 [myid:3] - WARN
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:NIOServerCnxn@357] - caught
> end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x35ea260c1fb0002, likely client has closed socket
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServ
> erCnxnFactory.java:208)
> at java.lang.Thread.run(Thread.java:745)
> 2017-09-21 03:00:03,978 [myid:3] - WARN
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:NIOServerCnxn@362] -
> Exception causing close of session 0x35ea260c1fb0003 due to
> java.io.IOException: Connection reset by peer
> 2017-09-21 03:00:04,000 [myid:3] - INFO
> [SessionTracker:SessionTrackerImpl@162] - SessionTrackerImpl exited loop!
> 2017-09-21 03:00:05,806 [myid:3] - WARN
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:NIOServerCnxn@357] - caught
> end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x35ea260c1fb0011, likely client has closed socket
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServ
> erCnxnFactory.java:208)
> at java.lang.Thread.run(Thread.java:745)
> 2017-09-21 03:00:09,600 [myid:3] - INFO  [Snapshot
> Thread:FileTxnSnapLog@240]
> - Snapshotting: 0x60000029c to /run/zookeeper/version-2/snapshot.60000029c
> 2017-09-21 03:00:09,606 [myid:3] - INFO  [SyncThread:3:FileTxnLog@199] -
> Creating new log file: log.60000029e
> 2017-09-21 03:00:10,399 [myid:3] - WARN
> [NIOServerCxn.Factory:localhost/127.0.0.1:5002:NIOServerCnxn@357] - caught
> end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x35ea260c1fb002f, likely client has closed socket
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServ
> erCnxnFactory.java:208)
> at java.lang.Thread.run(Thread.java:745)
>
> Thanks,
> Anand.
>
> On Thu, Oct 5, 2017 at 9:50 AM, Patrick Hunt <phunt@apache.org> wrote:
>
> > Unfortunately I don't see any attached logs, which makes it difficult to
> > provide you with insight. "Not sufficient followers synced" indicates
> that
> > you're losing followers, likely they are falling behind - what is your
> > metric tracking telling your wrt load on the compute and load on the
> > disk/memory/network/etc... also metrics at the ZK level (e.g. are zk
> > latencies increasing?) Check the logs to see if you're seeing "fsync"
> > slowness issues (it's a warning in the server logs). This is a pretty
> > common issue. GC might also be an issue, although that's more rare these
> > days (hard to say w/o knowing your use case, etc...) Again, look to your
> > metrics collection for insight where to start.
> >
> > Patrick
> >
> > On Wed, Oct 4, 2017 at 11:17 AM, Anand Parthasarathy <
> > anpartha@avinetworks.com> wrote:
> >
> > > Hi,
> > >
> > > We have an issue with a 3-node zookeeper ensemble where the quorum goes
> > > down due to no apparent reason every once in a while. Here is what I
> see
> > in
> > > the ZK leader:
> > >
> > > 2017-09-21 03:00:03,648 [myid:3] - INFO  [QuorumPeer[myid=3]/127.0.0.1
> :
> > > 5002:Leader@493] - Shutting down
> > > 2017-09-21 03:00:03,648 [myid:3] - INFO  [QuorumPeer[myid=3]/127.0.0.1
> :
> > > 5002:Leader@499] - Shutdown called
> > > java.lang.Exception: shutdown Leader! reason: Not sufficient followers
> > > synced, only synced with sids: [ 3 ]
> > >     at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.
> > java:499)
> > >     at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:474)
> > >     at org.apache.zookeeper.server.quorum.QuorumPeer.run(
> > > QuorumPeer.java:799)
> > >
> > > I have attached the logs from the 3 nodes around this time. Could you
> > pls.
> > > help understand what the issue could be here. The only thing I see a
> > little
> > > bit ahead of this timestamp is that all of them did a PurgeTask pretty
> > much
> > > at the same time.
> > >
> > > Thanks,
> > > Anand.
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message