zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Nauroth <cnaur...@hortonworks.com>
Subject Re: NullPointerException in ZKDatabase.calculateTxnLogSizeLimit
Date Thu, 28 May 2015 23:41:11 GMT
Hello Jared,

After the leader goes into this state, what does its dataDir look like?
Do you see any files named snapshot.*?  Would you be able to share ls
output or similar showing those files and their sizes?

It seems the only way this condition could occur is if a valid snapshot
could not be found.  The relevant validation code is in
FileSnap#findNValidSnapshots.  This can return an empty list if no valid
snapshots are found.  Then, further up the call stack,
FileSnap#findMostRecentSnapshot would map empty list to null, and
ZKDatabase#calculateTxnLogSizeLimit isn't equipped to handle that null.

--Chris Nauroth




On 5/28/15, 3:34 PM, "Jared Cantwell" <jared.cantwell@gmail.com> wrote:

>We are running a test on a 3-node Zookeeper setup where we power off one
>of
>the nodes for 30 minutes and then power it back on.  After the power on,
>the leader (who was not powered off) gets the following errors and then
>loops with NullPointerException indefinitely, preventing the restarting
>follower from joining the quorum.
>
>I could not find a Jira around this when searching for
>NullPointerException
>and/or ZKDatabase.  It doesn't happen all the time, so we don't believe
>its
>a configuration problem.  Has anyone seen this before?
>
>Thanks!
>~Jared
>
>--------------------------------------------------------------------------
>--------------------------------------------------------
>May 25 05:43:13.501866 zookeeper - INFO  [/11.46.65.86:2183
>:QuorumCnxManager$Listener@550] - Received connection request /
>11.46.65.76:39661
>May 25 05:43:13.502395 zookeeper - WARN
> [SendWorker:1:QuorumCnxManager$SendWorker@738] - Interrupted while
>waiting
>for message on queue
>May 25 05:43:13.502395 *java.lang.InterruptedException*
>May 25 05:43:13.502404 localhost     at
>java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.repo
>rtInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
>May 25 05:43:13.502422 localhost     at
>java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awai
>tNanos(AbstractQueuedSynchronizer.java:2095)
>May 25 05:43:13.502437 localhost     at
>java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:389)
>May 25 05:43:13.502451 localhost     at
>org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCn
>xManager.java:890)
>May 25 05:43:13.502465 localhost     at
>org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxMa
>nager.java:60)
>May 25 05:43:13.502479 localhost     at
>org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumC
>nxManager.java:726)
>May 25 05:43:13.502565 zookeeper - WARN
> [SendWorker:1:QuorumCnxManager$SendWorker@747] - Send worker leaving
>thread  id 1 my id = 8
>May 25 05:43:13.503007 zookeeper - WARN
> [RecvWorker:1:QuorumCnxManager$RecvWorker@821] - Connection broken for id
>1, my id = 8, error =
>May 25 05:43:13.503007 localhost java.net.SocketException: Socket closed
>May 25 05:43:13.503017 localhost     at
>java.net.SocketInputStream.socketRead0(Native Method)
>May 25 05:43:13.503048 localhost     at
>java.net.SocketInputStream.read(SocketInputStream.java:152)
>May 25 05:43:13.503048 localhost     at
>java.net.SocketInputStream.read(SocketInputStream.java:122)
>May 25 05:43:13.503173 localhost     at
>java.net.SocketInputStream.read(SocketInputStream.java:210)
>May 25 05:43:13.503173 localhost     at
>java.io.DataInputStream.readInt(DataInputStream.java:387)
>May 25 05:43:13.503179 localhost     at
>org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumC
>nxManager.java:806)
>May 25 05:43:13.503295 zookeeper - WARN
> [RecvWorker:1:QuorumCnxManager$RecvWorker@824] - Interrupting SendWorker
>May 25 05:43:13.507255 zookeeper - INFO
> [WorkerReceiver[myid=8]:FastLeaderElection@606] - Notification: 1
>(n.leader), 0xb00000ba7 (n.zxid), 0x1 (n.round), LOOKING (n.state), 1
>(n.sid), 0xb (n.peerEPoch), LEADING (my state)8000004f3 (n.config version)
>May 25 05:43:13.509960 zookeeper - INFO
> [WorkerReceiver[myid=8]:FastLeaderElection@606] - Notification: 1
>(n.leader), 0xb00000ba7 (n.zxid), 0x2 (n.round), LOOKING (n.state), 1
>(n.sid), 0xb (n.peerEPoch), LEADING (my state)8000004f3 (n.config version)
>May 25 05:43:13.529574 zookeeper - INFO
>[LearnerHandler-/11.46.65.76:51281
>:LearnerHandler@359] - Follower sid: 1 : info : 11.46.65.76:2182
>:2183:participant;11.46.65.76:2181
>May 25 05:43:13.529782 zookeeper - INFO
>[LearnerHandler-/11.46.65.76:51281
>:LearnerHandler@646] - Synchronizing with Follower sid: 1
>maxCommittedLog=0xb000011b1 minCommittedLog=0xb00000fbd
>lastProcessedZxid=0xb000011b1 peerLastZxid=0xb00000ba7
>May 25 05:43:13.529986 zookeeper - WARN
>[LearnerHandler-/11.46.65.76:51281
>:LearnerHandler@582] - ******* GOODBYE /11.46.65.76:51281 ********
>May 25 05:43:13.530513 zookeeper - ERROR
>[LearnerHandler-/11.46.65.76:51281
>:NIOServerCnxnFactory$1@92] - Thread LearnerHandler
>Socket[addr=/11.46.65.76,port=51281,localport=2182]
>tickOfNextAckDeadline:2389 synced?:true queuedPacketLength:1 died
>May 25 05:43:13.530513 *java.lang.NullPointerException*
>May 25 05:43:13.530538 localhost     at
>org.apache.zookeeper.server.ZKDatabase.calculateTxnLogSizeLimit(ZKDatabase
>.java:276)
>May 25 05:43:13.530538 localhost     at
>org.apache.zookeeper.server.quorum.LearnerHandler.syncFollower(LearnerHand
>ler.java:716)
>May 25 05:43:13.530540 localhost     at
>org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:
>404)
>May 25 05:43:13.782834 zookeeper - INFO
> [WorkerReceiver[myid=8]:FastLeaderElection@606] - Notification: 1
>(n.leader), 0xb00000ba7 (n.zxid), 0x0 (n.round), LOOKING (n.state), 1
>(n.sid), 0xb (n.peerEPoch), LEADING (my state)8000004f3 (n.config version)
>May 25 05:43:13.785988 zookeeper - INFO
>[LearnerHandler-/11.46.65.76:51282
>:LearnerHandler@359] - Follower sid: 1 : info : 11.46.65.76:2182
>:2183:participant;11.46.65.76:2181
>May 25 05:43:13.786310 zookeeper - INFO
>[LearnerHandler-/11.46.65.76:51282
>:LearnerHandler@646] - Synchronizing with Follower sid: 1
>maxCommittedLog=0xb000011b1 minCommittedLog=0xb00000fbd
>lastProcessedZxid=0xb000011b1 peerLastZxid=0xb00000ba7
>May 25 05:43:13.786611 zookeeper - WARN
>[LearnerHandler-/11.46.65.76:51282
>:LearnerHandler@582] - ******* GOODBYE /11.46.65.76:51282 ********
>May 25 05:43:13.787048 zookeeper - ERROR
>[LearnerHandler-/11.46.65.76:51282
>:NIOServerCnxnFactory$1@92] - Thread LearnerHandler
>Socket[addr=/11.46.65.76,port=51282,localport=2182]
>tickOfNextAckDeadline:2389 synced?:true queuedPacketLength:1 died
>May 25 05:43:13.787048 java.lang.NullPointerException
>May 25 05:43:13.787092 localhost     at
>org.apache.zookeeper.server.ZKDatabase.calculateTxnLogSizeLimit(ZKDatabase
>.java:276)
>May 25 05:43:13.787116 localhost     at
>org.apache.zookeeper.server.quorum.LearnerHandler.syncFollower(LearnerHand
>ler.java:716)
>May 25 05:43:13.787138 localhost     at
>org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:
>404)


Mime
View raw message