zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thawan Kooburat (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
Date Mon, 04 Nov 2013 18:58:19 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813111#comment-13813111
] 

Thawan Kooburat commented on ZOOKEEPER-1807:
--------------------------------------------

I believe we have a much different concern using large number of observers. In our internal
deployment, we did a few hacks which essentially kill all observer-to-observer communication.
Observers only observe the result of election algorithm. We also add random delay when observer
try to reconnect, so that participants has a chance to synchronize with the leader and form
the quorum before the observers take away the leader's bandwidth. 

My understanding is that with our leader election algorithm, you need to broadcast your vote
whenever your current vote change, so this will generate a lot of message during the initial
phase of the algorithm. Also, N x N communication needed by LE is not going to scale for large
deployment.  For me, I don't think promoting observer to participant is going to be a common
case (only needed for DR purpose), it would be acceptable to have optional flag to disable
that feature in order to reduce LE overhead with large number of observers.

> Observers spam each other creating connections to the election addr
> -------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1807
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
>             Project: ZooKeeper
>          Issue Type: Bug
>            Reporter: Raul Gutierrez Segales
>            Assignee: Germán Blanco
>             Fix For: 3.5.0
>
>         Attachments: ZOOKEEPER-1807.patch, notifications-loop.png
>
>
> Hey [~shralex],
> I noticed today that my Observers are spamming each other trying to open connections
to the election port. I've got tons of these:
> {noformat}
> 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already
for server 9
> 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already
for server 10
> 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already
for server 6
> 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already
for server 12
> 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already
for server 14
> {noformat}
> and so and so on ad nauseam. 
> Now, looking around I found this inside FastLeaderElection.java from when you committed
ZOOKEEPER-107:
> {noformat}
>      private void sendNotifications() {
> -        for (QuorumServer server : self.getVotingView().values()) {
> -            long sid = server.id;
> -
> +        for (long sid : self.getAllKnownServerIds()) {
> +            QuorumVerifier qv = self.getQuorumVerifier();
> {noformat}
> Is that really desired? I suspect that is what's causing Observers to try to connect
to each other (as opposed as just connecting to participants). I'll give it a try now and
let you know. (Also, we use observer ids that are > 0, and I saw some parts of the code
that might not deal with that assumption - so it could be that too..). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message