Return-Path: X-Original-To: apmail-zookeeper-dev-archive@www.apache.org Delivered-To: apmail-zookeeper-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F03B04A19 for ; Tue, 5 Jul 2011 10:17:54 +0000 (UTC) Received: (qmail 41178 invoked by uid 500); 5 Jul 2011 10:17:54 -0000 Delivered-To: apmail-zookeeper-dev-archive@zookeeper.apache.org Received: (qmail 40324 invoked by uid 500); 5 Jul 2011 10:17:45 -0000 Mailing-List: contact dev-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@zookeeper.apache.org Delivered-To: mailing list dev@zookeeper.apache.org Received: (qmail 40305 invoked by uid 99); 5 Jul 2011 10:17:42 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Jul 2011 10:17:42 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Jul 2011 10:17:38 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id E221C43648 for ; Tue, 5 Jul 2011 10:17:16 +0000 (UTC) Date: Tue, 5 Jul 2011 10:17:16 +0000 (UTC) From: "Flavio Junqueira (JIRA)" To: dev@zookeeper.apache.org Message-ID: <525686677.170.1309861036923.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <522318670.7997.1309504828447.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (ZOOKEEPER-1115) follower can not sync with leader MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/ZOOKEEPER-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13059817#comment-13059817 ] Flavio Junqueira commented on ZOOKEEPER-1115: --------------------------------------------- What about increasing syncLimit? > follower can not sync with leader > --------------------------------- > > Key: ZOOKEEPER-1115 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1115 > Project: ZooKeeper > Issue Type: Bug > Components: quorum > Affects Versions: 3.3.0, 3.3.3 > Environment: linux rhel 4, x64, java version 1.4.2 > Reporter: helei > Priority: Critical > > there are 5 members in the quorum. one follower can not sync with leader after restart. it seems leader has close the data connection with this follower because of read timeout. here is the key log in follower: > 2011-06-30 22:14:45,069 - WARN [Thread-17:QuorumCnxManager$RecvWorker@658] - Connection broken: > java.nio.channels.ClosedChannelException > at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:113) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:156) > at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:629) > 2011-06-30 22:14:45,069 - INFO [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@689] - Notification: 3, 17198470148, 3, 3, LOOKING, LOOKING, 3 > 2011-06-30 22:14:45,070 - ERROR [Thread-16:QuorumCnxManager$SendWorker@559] - Failed to send last message. Shutting down thread. > java.nio.channels.ClosedChannelException > at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324) > at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.send(QuorumCnxManager.java:548) > at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:557) > 2011-06-30 22:14:45,082 - INFO [QuorumPeer:/0.0.0.0:2181:Learner@282] - Getting a diff from the leader 0x4011bd462 > 2011-06-30 22:14:45,083 - WARN [Thread-18:QuorumCnxManager$SendWorker@589] - Send worker leaving thread > 2011-06-30 22:14:45,085 - WARN [QuorumPeer:/0.0.0.0:2181:Follower@116] - Got zxid 0x4011bd405 expected 0x1 > 2011-06-30 22:14:45,090 - INFO [QuorumPeer:/0.0.0.0:2181:FileTxnSnapLog@208] - Snapshotting: 4011bd462 > 2011-06-30 22:14:53,397 - WARN [SyncThread:3:SendAckRequestProcessor@63] - Closing connection to leader, exception during packet send > java.net.SocketException: Broken pipe > at java.net.SocketOutputStream.socketWrite0(Native Method) > at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) > at java.net.SocketOutputStream.write(SocketOutputStream.java:136) > at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:126) > at org.apache.zookeeper.server.quorum.SendAckRequestProcessor.flush(SendAckRequestProcessor.java:61) > at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:164) > at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:98) > 2011-06-30 22:14:53,398 - WARN [QuorumPeer:/0.0.0.0:2181:Follower@82] - Exception when following the leader > java.net.SocketException: Socket closed > at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:99) > at java.net.SocketOutputStream.write(SocketOutputStream.java:136) > at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:126) > at org.apache.zookeeper.server.quorum.Learner.ping(Learner.java:358) > at org.apache.zookeeper.server.quorum.Follower.processPacket(Follower.java:108) > at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:79) > at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:634) > 2011-06-30 22:14:53,398 - WARN [SyncThread:3:SendAckRequestProcessor@63] - Closing connection to leader, exception during packet send > java.net.SocketException: Socket closed > at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:99) > at java.net.SocketOutputStream.write(SocketOutputStream.java:136) > at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:126) > at org.apache.zookeeper.server.quorum.SendAckRequestProcessor.flush(SendAckRequestProcessor.java:61) > at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:164) > at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:98) > 2011-06-30 22:14:53,399 - INFO [QuorumPeer:/0.0.0.0:2181:Follower@166] - shutdown called > java.lang.Exception: shutdown Follower > at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166) > at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:638) > and these are the leader's: > 2011-06-30 22:14:35,943 - ERROR [LearnerHandler-/10.23.247.163:14975:LearnerHandler@444] - Unexpected exception causing shutdown while sock still open > java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:129) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read(BufferedInputStream.java:237) > at java.io.DataInputStream.readInt(DataInputStream.java:370) > at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:84) > at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) > at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:358) > 2011-06-30 22:14:35,943 - WARN [LearnerHandler-/10.23.247.163:14975:LearnerHandler@457] - ******* GOODBYE /10.23.247.163:14975 ******** > 2011-06-30 22:14:48,943 - ERROR [CommitProcessor:4:NIOServerCnxn@422] - Unexpected Exception: > java.nio.channels.CancelledKeyException > at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55) > at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59) > at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:395) > at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1360) > at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:367) > at org.apache.zookeeper.server.quorum.Leader$ToBeAppliedRequestProcessor.processRequest(Leader.java:535) > at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73) > 2011-06-30 22:14:49,084 - ERROR [LearnerHandler-/10.23.247.163:14998:LearnerHandler@444] - Unexpected exception causing shutdown while sock still open > java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:129) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read(BufferedInputStream.java:237) > at java.io.DataInputStream.readInt(DataInputStream.java:370) > at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:84) > at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) > at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:358) > 2011-06-30 22:14:49,084 - WARN [LearnerHandler-/10.23.247.163:14998:LearnerHandler@457] - ******* GOODBYE /10.23.247.163:14998 ******** -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira