zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleksandr Sokalskyi1 <Oleksandr_Sokalsk...@epam.com>
Subject Zookeeper cluster suddenly crashes. Zk version 3.4.6
Date Mon, 11 May 2015 17:36:39 GMT
Hi All,



We are using Apache Storm with Apache Zookeepers.

In our configuration we have Zookeeper cluster with 3 nodes in it. Everything was well. But
recently we faced issue that zookeepers got restarted during night, although there are no
calculations or any load during at that period. And it continuously happening on each night.

Zookeeper version is 3.4.6.



Here are the logs for time of failure.



On follower:

2015-05-02 08:20:52,603 [myid:3] - INFO  [PurgeTask:DatadirCleanupManager$PurgeTask@138] -
Purge task started.

2015-05-02 08:20:53,044 [myid:3] - INFO  [PurgeTask:DatadirCleanupManager$PurgeTask@144] -
Purge task completed.

2015-05-02 08:45:03,416 [myid:3] - WARN  [QuorumPeer[myid=3]/0.0.0.0:2181:Follower@89] - Exception
when following the leader

java.io.EOFException

                at java.io.DataInputStream.readInt(DataInputStream.java:392)

                at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)

                at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)

                at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)

                at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)

                at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)

                at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)

2015-05-02 08:45:03,417 [myid:3] - INFO  [QuorumPeer[myid=3]/0.0.0.0:2181:Follower@166] -
shutdown called

java.lang.Exception: shutdown Follower

                at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)

                at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:790)

2015-05-02 08:45:03,418 [myid:3] - INFO  [QuorumPeer[myid=3]/0.0.0.0:2181:NIOServerCnxn@1007]
- Closed socket connection for client /10.49.5.15:41692 which had sessionid 0x34cef2f0370017c

2015-05-02 08:45:03,418 [myid:3] - INFO  [QuorumPeer[myid=3]/0.0.0.0:2181:NIOServerCnxn@1007]
- Closed socket connection for client /10.49.5.15:41687 which had sessionid 0x34cef2f0370017a

2015-05-02 08:45:03,419 [myid:3] - INFO  [QuorumPeer[myid=3]/0.0.0.0:2181:NIOServerCnxn@1007]
- Closed socket connection for client /10.49.5.19:40478 which had sessionid 0x34cef2f0370017d

2015-05-02 08:45:03,419 [myid:3] - INFO  [QuorumPeer[myid=3]/0.0.0.0:2181:NIOServerCnxn@1007]
- Closed socket connection for client /10.49.5.19:40474 which had sessionid 0x34cef2f0370017b

2015-05-02 08:45:03,420 [myid:3] - INFO  [QuorumPeer[myid=3]/0.0.0.0:2181:NIOServerCnxn@1007]
- Closed socket connection for client /10.49.5.17:55431 which had sessionid 0x34cef2f03700179

2015-05-02 08:45:03,420 [myid:3] - INFO  [QuorumPeer[myid=3]/0.0.0.0:2181:FollowerZooKeeperServer@139]
- Shutting down

2015-05-02 08:45:03,420 [myid:3] - INFO  [QuorumPeer[myid=3]/0.0.0.0:2181:ZooKeeperServer@441]
- shutting down

2015-05-02 08:45:03,420 [myid:3] - INFO  [QuorumPeer[myid=3]/0.0.0.0:2181:FollowerRequestProcessor@105]
- Shutting down

2015-05-02 08:45:03,421 [myid:3] - INFO  [FollowerRequestProcessor:3:FollowerRequestProcessor@95]
- FollowerRequestProcessor exited loop!

2015-05-02 08:45:03,421 [myid:3] - INFO  [QuorumPeer[myid=3]/0.0.0.0:2181:CommitProcessor@181]
- Shutting down

2015-05-02 08:45:03,422 [myid:3] - INFO  [QuorumPeer[myid=3]/0.0.0.0:2181:FinalRequestProcessor@415]
- shutdown of request processor complete

2015-05-02 08:45:03,422 [myid:3] - INFO  [CommitProcessor:3:CommitProcessor@150] - CommitProcessor
exited loop!

2015-05-02 08:45:03,422 [myid:3] - INFO  [QuorumPeer[myid=3]/0.0.0.0:2181:SyncRequestProcessor@209]
- Shutting down





On leader:



2015-05-02 08:20:45,078 [myid:2] - INFO  [PurgeTask:DatadirCleanupManager$PurgeTask@138] -
Purge task started.

2015-05-02 08:20:45,403 [myid:2] - INFO  [PurgeTask:DatadirCleanupManager$PurgeTask@144] -
Purge task completed.

2015-05-02 08:45:03,415 [myid:2] - WARN  [QuorumPeer[myid=2]/0.0.0.0:2181:LearnerHandler@687]
- Closing connection to peer due to transaction timeout.

2015-05-02 08:45:03,415 [myid:2] - WARN  [QuorumPeer[myid=2]/0.0.0.0:2181:LearnerHandler@687]
- Closing connection to peer due to transaction timeout.

2015-05-02 08:45:03,416 [myid:2] - WARN  [LearnerHandler-/148.112.48.199:43202:LearnerHandler@646]
- ******* GOODBYE /148.112.48.199:43202 ********

2015-05-02 08:45:03,417 [myid:2] - WARN  [LearnerHandler-/148.112.48.199:43202:LearnerHandler@658]
- Ignoring unexpected exception

java.lang.InterruptedException

                at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)

                at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)

                at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)

                at org.apache.zookeeper.server.quorum.LearnerHandler.shutdown(LearnerHandler.java:656)

                at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:649)

2015-05-02 08:45:03,417 [myid:2] - WARN  [LearnerHandler-/148.112.48.201:33870:LearnerHandler@646]
- ******* GOODBYE /148.112.48.201:33870 ********

2015-05-02 08:45:03,417 [myid:2] - WARN  [LearnerHandler-/148.112.48.201:33870:LearnerHandler@658]
- Ignoring unexpected exception

java.lang.InterruptedException

                at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)

                at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)

                at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)

                at org.apache.zookeeper.server.quorum.LearnerHandler.shutdown(LearnerHandler.java:656)

                at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:649)

2015-05-02 08:45:03,536 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197]
- Accepted socket connection from /10.49.5.16:53812

2015-05-02 08:45:03,539 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@861]
- Client attempting to renew session 0x14cef2ee4c80171 at /10.49.5.16:53812

2015-05-02 08:45:03,539 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@617]
- Established session 0x14cef2ee4c80171 with negotiated timeout 20000 for client /10.49.5.16:53812

2015-05-02 08:45:03,582 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197]
- Accepted socket connection from /10.49.5.19:37378

2015-05-02 08:45:03,585 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@861]
- Client attempting to renew session 0x14cef2ee4c80173 at /10.49.5.19:37378

2015-05-02 08:45:03,586 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@617]
- Established session 0x14cef2ee4c80173 with negotiated timeout 20000 for client /10.49.5.19:37378

2015-05-02 08:45:03,606 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197]
- Accepted socket connection from /10.49.5.17:40379

............

2015-05-02 08:45:04,379 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@617]
- Established session 0x14cef2ee4c8016f with negotiated timeout 20000 for client /10.49.5.18:52703

2015-05-02 08:45:04,416 [myid:2] - INFO  [QuorumPeer[myid=2]/0.0.0.0:2181:Leader@493] - Shutting
down

2015-05-02 08:45:04,416 [myid:2] - INFO  [QuorumPeer[myid=2]/0.0.0.0:2181:Leader@499] - Shutdown
called

java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced
with sids: [ 2 ]

                at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:499)

                at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:474)

                at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:799)

2015-05-02 08:45:04,417 [myid:2] - INFO  [QuorumPeer[myid=2]/0.0.0.0:2181:NIOServerCnxn@1007]
- Closed socket connection for client /10.49.5.17:53780 which had sessionid 0x24cef2ee4cf016a





I don't understand why it's happening during nights, when there are no load. And everything
running ok during day.

Can you help me to understand the root cause.

Will increasing syncLimit parameter? resolve this issue?



Thanks.


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message