zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "helei (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-1115) follower can not sync with leader
Date Mon, 04 Jul 2011 05:54:22 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13059326#comment-13059326
] 

helei commented on ZOOKEEPER-1115:
----------------------------------

Hi, guys. After follower get synced with leader, we will call zk.takeSnapshot in communication
thread in Learner.java. And this is a function may block. In my case, it takes about 4 seconds
to finish the snap. During this period, follower would not ack leader, and leader would thought
that something happend to this follower, and close the connection with this follower. After
that the follower will receive a broken pipe exception, and it will retry after that. Of cause
it will fail time after time.
I think it is a problem.
thanks

> follower can not sync with leader
> ---------------------------------
>
>                 Key: ZOOKEEPER-1115
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1115
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0, 3.3.3
>         Environment: linux rhel 4, x64, java version 1.4.2
>            Reporter: helei
>            Priority: Critical
>
> there are 5 members in the quorum. one follower can not sync with leader after restart.
it seems leader has close the data connection with this follower because of read timeout.
here is the key log in follower:
> 2011-06-30 22:14:45,069 - WARN  [Thread-17:QuorumCnxManager$RecvWorker@658] - Connection
broken: 
> java.nio.channels.ClosedChannelException
>         at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:113)
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:156)
>         at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:629)
> 2011-06-30 22:14:45,069 - INFO  [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@689] - Notification:
3, 17198470148, 3, 3, LOOKING, LOOKING, 3
> 2011-06-30 22:14:45,070 - ERROR [Thread-16:QuorumCnxManager$SendWorker@559] - Failed
to send last message. Shutting down thread.
> java.nio.channels.ClosedChannelException
>         at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
>         at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.send(QuorumCnxManager.java:548)
>         at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:557)
> 2011-06-30 22:14:45,082 - INFO  [QuorumPeer:/0.0.0.0:2181:Learner@282] - Getting a diff
from the leader 0x4011bd462
> 2011-06-30 22:14:45,083 - WARN  [Thread-18:QuorumCnxManager$SendWorker@589] - Send worker
leaving thread
> 2011-06-30 22:14:45,085 - WARN  [QuorumPeer:/0.0.0.0:2181:Follower@116] - Got zxid 0x4011bd405
expected 0x1
> 2011-06-30 22:14:45,090 - INFO  [QuorumPeer:/0.0.0.0:2181:FileTxnSnapLog@208] - Snapshotting:
4011bd462
> 2011-06-30 22:14:53,397 - WARN  [SyncThread:3:SendAckRequestProcessor@63] - Closing connection
to leader, exception during packet send
> java.net.SocketException: Broken pipe
>         at java.net.SocketOutputStream.socketWrite0(Native Method)
>         at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>         at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>         at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>         at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
>         at org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:126)
>         at org.apache.zookeeper.server.quorum.SendAckRequestProcessor.flush(SendAckRequestProcessor.java:61)
>         at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:164)
>         at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:98)
> 2011-06-30 22:14:53,398 - WARN  [QuorumPeer:/0.0.0.0:2181:Follower@82] - Exception when
following the leader
> java.net.SocketException: Socket closed
>         at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:99)
>         at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>         at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>         at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
>         at org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:126)
>         at org.apache.zookeeper.server.quorum.Learner.ping(Learner.java:358)
>         at org.apache.zookeeper.server.quorum.Follower.processPacket(Follower.java:108)
>         at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:79)
>         at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:634)
> 2011-06-30 22:14:53,398 - WARN  [SyncThread:3:SendAckRequestProcessor@63] - Closing connection
to leader, exception during packet send
> java.net.SocketException: Socket closed
>         at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:99)
>         at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>         at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>         at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
>         at org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:126)
>         at org.apache.zookeeper.server.quorum.SendAckRequestProcessor.flush(SendAckRequestProcessor.java:61)
>         at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:164)
>         at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:98)
> 2011-06-30 22:14:53,399 - INFO  [QuorumPeer:/0.0.0.0:2181:Follower@166] - shutdown called
> java.lang.Exception: shutdown Follower
>         at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
>         at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:638)
> and these are the leader's:
> 2011-06-30 22:14:35,943 - ERROR [LearnerHandler-/10.23.247.163:14975:LearnerHandler@444]
- Unexpected exception causing shutdown while sock still open
> java.net.SocketTimeoutException: Read timed out
>         at java.net.SocketInputStream.socketRead0(Native Method)
>         at java.net.SocketInputStream.read(SocketInputStream.java:129)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
>         at java.io.DataInputStream.readInt(DataInputStream.java:370)
>         at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>         at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:84)
>         at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
>         at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:358)
> 2011-06-30 22:14:35,943 - WARN  [LearnerHandler-/10.23.247.163:14975:LearnerHandler@457]
- ******* GOODBYE /10.23.247.163:14975 ********
> 2011-06-30 22:14:48,943 - ERROR [CommitProcessor:4:NIOServerCnxn@422] - Unexpected Exception:

> java.nio.channels.CancelledKeyException
>         at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
>         at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
>         at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:395)
>         at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1360)
>         at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:367)
>         at org.apache.zookeeper.server.quorum.Leader$ToBeAppliedRequestProcessor.processRequest(Leader.java:535)
>         at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)
> 2011-06-30 22:14:49,084 - ERROR [LearnerHandler-/10.23.247.163:14998:LearnerHandler@444]
- Unexpected exception causing shutdown while sock still open
> java.net.SocketTimeoutException: Read timed out
>         at java.net.SocketInputStream.socketRead0(Native Method)
>         at java.net.SocketInputStream.read(SocketInputStream.java:129)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
>         at java.io.DataInputStream.readInt(DataInputStream.java:370)
>         at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>         at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:84)
>         at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
>         at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:358)
> 2011-06-30 22:14:49,084 - WARN  [LearnerHandler-/10.23.247.163:14998:LearnerHandler@457]
- ******* GOODBYE /10.23.247.163:14998 ********

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message