Return-Path: X-Original-To: apmail-zookeeper-dev-archive@www.apache.org Delivered-To: apmail-zookeeper-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9430910E40 for ; Sat, 2 Nov 2013 02:39:18 +0000 (UTC) Received: (qmail 83790 invoked by uid 500); 2 Nov 2013 02:39:17 -0000 Delivered-To: apmail-zookeeper-dev-archive@zookeeper.apache.org Received: (qmail 83726 invoked by uid 500); 2 Nov 2013 02:39:17 -0000 Mailing-List: contact dev-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@zookeeper.apache.org Delivered-To: mailing list dev@zookeeper.apache.org Received: (qmail 83710 invoked by uid 99); 2 Nov 2013 02:39:17 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 02 Nov 2013 02:39:17 +0000 Date: Sat, 2 Nov 2013 02:39:17 +0000 (UTC) From: "Raul Gutierrez Segales (JIRA)" To: dev@zookeeper.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13811849#comment-13811849 ] Raul Gutierrez Segales commented on ZOOKEEPER-1807: --------------------------------------------------- Okey - this seems to actually be related to ZOOKEEPER-107, [~shralex]. I added some debugging logging and I've see that the spam, to all Observers, are the notifications: {noformat} 2013-11-02 02:33:21,341 - INFO [WorkerSender[myid=13]] - will send: leader = 3, zxid = 558362464215, electionEpoch = 5, state = OBSERVING, sid = 9, peerEpoch = 130, configData = [B@5a0c0ce6 2013-11-02 02:33:21,341 - INFO [WorkerSender[myid=13]] - will send: leader = 3, zxid = 558362464215, electionEpoch = 5, state = OBSERVING, sid = 12, peerEpoch = 130, configData = [B@4d22fe39 2013-11-02 02:33:21,341 - INFO [WorkerSender[myid=13]] - will send: leader = 3, zxid = 558362464215, electionEpoch = 5, state = OBSERVING, sid = 6, peerEpoch = 130, configData = [B@346077bf 2013-11-02 02:33:21,341 - INFO [WorkerSender[myid=13]] - will send: leader = 3, zxid = 558362464215, electionEpoch = 5, state = OBSERVING, sid = 13, peerEpoch = 130, configData = [B@2955b776 2013-11-02 02:33:21,341 - INFO [WorkerSender[myid=13]] - will send: leader = 3, zxid = 558362464215, electionEpoch = 5, state = OBSERVING, sid = 11, peerEpoch = 130, configData = [B@3a7fb92d 2013-11-02 02:33:21,341 - INFO [WorkerSender[myid=13]] - will send: leader = 3, zxid = 558362464215, electionEpoch = 5, state = OBSERVING, sid = 14, peerEpoch = 130, configData = [B@1756575c 2013-11-02 02:33:21,341 - INFO [WorkerSender[myid=13]] - will send: leader = 3, zxid = 558362464215, electionEpoch = 5, state = OBSERVING, sid = 13, peerEpoch = 130, configData = [B@258164fc {noformat} As you can see, it's sending tons of notifications per second. Not good :) With this diff in FastLeaderElection.java (i.e.: a revert of part of your change): {noformat} private void sendNotifications() { - for (long sid : self.getAllKnownServerIds()) { + for (QuorumServer server : self.getVotingView().values()) { + long sid = server.id; {noformat} observers, of course, don't get spammed. I am guessing some condition is failing for Observers that assumes the notifications are fresh and sends them repeatedly? > Observers spam each other creating connections to the election addr > ------------------------------------------------------------------- > > Key: ZOOKEEPER-1807 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 > Project: ZooKeeper > Issue Type: Bug > Reporter: Raul Gutierrez Segales > Assignee: Raul Gutierrez Segales > Fix For: 3.5.0 > > > Hey [~shralex], > I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: > {noformat} > 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 > 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 > 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 > 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 > 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 > {noformat} > and so and so on ad nauseam. > Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: > {noformat} > private void sendNotifications() { > - for (QuorumServer server : self.getVotingView().values()) { > - long sid = server.id; > - > + for (long sid : self.getAllKnownServerIds()) { > + QuorumVerifier qv = self.getQuorumVerifier(); > {noformat} > Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are > 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)