Return-Path: X-Original-To: apmail-zookeeper-dev-archive@www.apache.org Delivered-To: apmail-zookeeper-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6C69B10EFB for ; Sat, 2 Nov 2013 03:31:26 +0000 (UTC) Received: (qmail 1514 invoked by uid 500); 2 Nov 2013 03:31:24 -0000 Delivered-To: apmail-zookeeper-dev-archive@zookeeper.apache.org Received: (qmail 1063 invoked by uid 500); 2 Nov 2013 03:31:20 -0000 Mailing-List: contact dev-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@zookeeper.apache.org Delivered-To: mailing list dev@zookeeper.apache.org Received: (qmail 1050 invoked by uid 99); 2 Nov 2013 03:31:18 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 02 Nov 2013 03:31:18 +0000 Date: Sat, 2 Nov 2013 03:31:18 +0000 (UTC) From: "Raul Gutierrez Segales (JIRA)" To: dev@zookeeper.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13811858#comment-13811858 ] Raul Gutierrez Segales commented on ZOOKEEPER-1807: --------------------------------------------------- I think what's happening is that when we send the initial notifications to all members, as opposed to just voting members as it was before, we trigger off a self-replicating cascade of notifications. Each Observers gets the notification and then by virtue of: {noformat} /* * If it is from a non-voting server (such as an observer or * a non-voting follower), respond right away. */ if(!self.getVotingView().containsKey(response.sid)){ ..... } {noformat} it replies back to each Observer and so on. So sounds to me that this needs to match what we have in sendNotifications and actually check response.sid against self.getAllKnownServerIds() to avoid the endless echoing of notifications that I am seeing. Thoughts [~shralex], [~fpj] ? > Observers spam each other creating connections to the election addr > ------------------------------------------------------------------- > > Key: ZOOKEEPER-1807 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 > Project: ZooKeeper > Issue Type: Bug > Reporter: Raul Gutierrez Segales > Assignee: Raul Gutierrez Segales > Fix For: 3.5.0 > > > Hey [~shralex], > I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: > {noformat} > 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 > 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 > 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 > 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 > 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 > {noformat} > and so and so on ad nauseam. > Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: > {noformat} > private void sendNotifications() { > - for (QuorumServer server : self.getVotingView().values()) { > - long sid = server.id; > - > + for (long sid : self.getAllKnownServerIds()) { > + QuorumVerifier qv = self.getQuorumVerifier(); > {noformat} > Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are > 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)