zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Will Johnson <willjohnsonsea...@gmail.com>
Subject Re: failure modes for 3 node cluster
Date Wed, 15 Jun 2011 12:56:24 GMT
fwiw:  It appears that this happens when the leader happens to be the node
shutdown (node 3 below).  If the leader is chosen to be nodes 1 or 2
everything works fine.  Also, this only happens if node 3 is brought down
cleanly.  If the node is killed (kill -9) everything works as expected.

- will

On Tue, Jun 14, 2011 at 11:20 AM, Will Johnson
<willjohnsonsearch@gmail.com>wrote:

> I’m trying to embed zookeeper inside my application and having trouble
> testing a simple failure scenario.  My test goes as follows:
>
>
>
> Start up a 3 node cluster
>
> Create a client against all 3 nodes
>
> Add data A
>
> Stop node 3
>
> Add data B
>
> Start the previously stopped node 3
>
> Create a new client against node 3 only
>
> Make sure data A + B are present.
>
>
>
> Most of the time this works as expected but every once in a while it hangs
> on multiple platforms (win64 and lin64) on the call to ‘add data B’.  My
> client automatically handles connection loss events and will try to
> automatically replay the message until it succeeds however it is never able
> to.  From reading the doc in the ops guide it seems to say that 3 nodes will
> work “Thus, a deployment that consists of three machines can handle one
> failure.”  Am I missing something?   Do I need more than 3 nodes?  Logs and
> stack traces are the logic for create:
>
>
>
> 11/06/14 10:52:52 INFO ZooKeeper.DEBUGLOG: shutdown 2
>
> 11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Connection broken for id 1,
> my id = 2, error = java.nio.channels.AsynchronousCloseException
>
> 11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Interrupting SendWorker
>
> 11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Connection broken for id 3,
> my id = 2, error = java.nio.channels.AsynchronousCloseException
>
> 11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Interrupting SendWorker
>
> 11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Connection broken for id 2,
> my id = 3, error = java.io.IOException: Channel eof
>
> 11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Connection broken for id 2,
> my id = 1, error = java.io.IOException: Channel eof
>
> 11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Interrupting SendWorker
>
> 11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Interrupting SendWorker
>
> 11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Interrupted while waiting
> for message on queue
>
> java.lang.InterruptedException
>
>                 at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
>
>                 at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1976)
>
>                 at
> java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342)
>
>                 at
> org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:622)
>
> 11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Send worker leaving thread
>
> 11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Interrupted while waiting
> for message on queue
>
> java.lang.InterruptedException
>
>                 at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
>
>                 at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1976)
>
>                 at
> java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342)
>
>                 at
> org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:622)
>
> 11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Send worker leaving thread
>
> 11/06/14 10:52:52 INFO quorum.Learner: shutdown called
>
> java.lang.Exception: shutdown Follower
>
>                 at
> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:165)
>
>                 at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:649)
>
> 11/06/14 10:52:52 INFO server.FinalRequestProcessor: shutdown of request
> processor complete
>
> 11/06/14 10:52:52 WARN quorum.QuorumPeer: QuorumPeer main thread exited
>
> 11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Interrupted while waiting
> for message on queue
>
> java.lang.InterruptedException
>
>                 at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
>
>                 at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1976)
>
>                 at
> java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342)
>
>                 at
> org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:622)
>
> 11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Send worker leaving thread
>
> 11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Interrupted while waiting
> for message on queue
>
> java.lang.InterruptedException
>
>                 at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
>
>                 at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1976)
>
>                 at
> java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342)
>
>                 at
> org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:622)
>
> 11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Send worker leaving thread
>
> 11/06/14 10:52:52 ERROR quorum.QuorumCnxManager: Exception while listening
>
> java.nio.channels.AsynchronousCloseException
>
>                 at
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:185)
>
>                 at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:152)
>
>                 at
> org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:478)
>
> 11/06/14 10:52:52 INFO quorum.QuorumCnxManager: Leaving listener
>
> 11/06/14 10:52:52 INFO quorum.FastLeaderElection: WorkerSender is down
>
> 11/06/14 10:52:52 INFO quorum.FastLeaderElection: WorkerReceiver is down
>
> 11/06/14 10:52:52 INFO zookeeper.LoggingWatcher: Saw event: WatchedEvent
> state:Disconnected type:None path:null
>
> 11/06/14 10:52:52 ERROR zookeeper.ZooUtils$AieZooKeeper:
> ATTIVIO-PLATFORM-85 : Failed trying to talk to zookeeper, retrying...
>
>   org.apache.zookeeper.KeeperException$ConnectionLossException -
> KeeperErrorCode = ConnectionLoss for /mugatu
>
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /mugatu
>
>                 at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
>
>                 at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>
>                 at
> org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
>
>                 at
> com.attivio.app.config.zookeeper.ZooUtils$AieZooKeeper.create(ZooUtils.java:277)
>
>                 at
> com.attivio.app.config.zoo.ZooServerTest.check(ZooServerTest.java:116)
>
>                 at
> com.attivio.app.config.zoo.ZooServerTest.helloQuroumZoo(ZooServerTest.java:78)
>
>                 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
>
>                 at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>
>                 at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>
>                 at java.lang.reflect.Method.invoke(Method.java:597)
>
>                 at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
>
>                 at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>
>                 at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
>
>                 at
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>
>                 at
> com.attivio.junit.internal.runners.statements.FailAndStackDumpOnTimeout$2.call(FailAndStackDumpOnTimeout.java:63)
>
>                 at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>
>                 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>
>                 at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>
>                 at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>
>                 at java.lang.Thread.run(Thread.java:619)
>
> 11/06/14 10:52:53 INFO zookeeper.ClientCnxn: Opening socket connection to
> server localhost/127.0.0.1:16055
>
> 11/06/14 10:52:53 INFO zookeeper.ClientCnxn: Socket connection established
> to localhost/127.0.0.1:16055, initiating session
>
> 11/06/14 10:52:53 INFO server.NIOServerCnxn: Accepted socket connection
> from /127.0.0.1:44515
>
> 11/06/14 10:52:53 INFO server.NIOServerCnxn: Client attempting to renew
> session 0x2308ea3fab10000 at /127.0.0.1:44515
>
> 11/06/14 10:52:53 ERROR server.NIOServerCnxn: Thread
> Thread[NIOServerCxn.Factory:localhost/127.0.0.1:16055,5,main] died
>
> java.lang.AssertionError
>
>                 at
> org.apache.zookeeper.jmx.MBeanRegistry.register(MBeanRegistry.java:66)
>
>                 at
> org.apache.zookeeper.server.NIOServerCnxn.finishSessionInit(NIOServerCnxn.java:1552)
>
>                 at
> org.apache.zookeeper.server.ZooKeeperServer.revalidateSession(ZooKeeperServer.java:529)
>
>                 at
> org.apache.zookeeper.server.quorum.LeaderZooKeeperServer.revalidateSession(LeaderZooKeeperServer.java:167)
>
>                 at
> org.apache.zookeeper.server.ZooKeeperServer.reopenSession(ZooKeeperServer.java:537)
>
>                 at
> org.apache.zookeeper.server.NIOServerCnxn.readConnectRequest(NIOServerCnxn.java:775)
>
>                 at
> org.apache.zookeeper.server.NIOServerCnxn.readPayload(NIOServerCnxn.java:485)
>
>                 at
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:521)
>
>                 at
> org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:262)
>
> 11/06/14 10:52:56 ERROR quorum.LearnerHandler: Unexpected exception causing
> shutdown while sock still open
>
> java.net.SocketTimeoutException: Read timed out
>
>                 at java.net.SocketInputStream.socketRead0(Native Method)
>
>                 at
> java.net.SocketInputStream.read(SocketInputStream.java:129)
>
>                 at
> java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>
>                 at
> java.io.BufferedInputStream.read(BufferedInputStream.java:237)
>
>                 at
> java.io.DataInputStream.readInt(DataInputStream.java:370)
>
>                 at
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>
>                 at
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:84)
>
>                 at
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
>
>                 at
> org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:375)
>
> 11/06/14 10:52:56 WARN quorum.LearnerHandler: ******* GOODBYE /
> 127.0.0.1:34413 ********
>
> 11/06/14 10:53:06 INFO zookeeper.ClientCnxn: Client session timed out, have
> not heard from server in 13333ms for sessionid 0x2308ea3fab10000, closing
> socket connection and attempting reconnect
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message