zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raul Gutierrez Segales (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
Date Sat, 02 Nov 2013 02:39:17 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13811849#comment-13811849
] 

Raul Gutierrez Segales commented on ZOOKEEPER-1807:
---------------------------------------------------

Okey - this seems to actually be related to ZOOKEEPER-107, [~shralex].  I added some debugging
logging and I've see that the spam, to all Observers, are the notifications:

{noformat}
2013-11-02 02:33:21,341 - INFO  [WorkerSender[myid=13]] - will send: leader = 3, zxid = 558362464215,
electionEpoch = 5, state = OBSERVING, sid = 9, peerEpoch = 130, configData = [B@5a0c0ce6
2013-11-02 02:33:21,341 - INFO  [WorkerSender[myid=13]] - will send: leader = 3, zxid = 558362464215,
electionEpoch = 5, state = OBSERVING, sid = 12, peerEpoch = 130, configData = [B@4d22fe39
2013-11-02 02:33:21,341 - INFO  [WorkerSender[myid=13]] - will send: leader = 3, zxid = 558362464215,
electionEpoch = 5, state = OBSERVING, sid = 6, peerEpoch = 130, configData = [B@346077bf
2013-11-02 02:33:21,341 - INFO  [WorkerSender[myid=13]] - will send: leader = 3, zxid = 558362464215,
electionEpoch = 5, state = OBSERVING, sid = 13, peerEpoch = 130, configData = [B@2955b776
2013-11-02 02:33:21,341 - INFO  [WorkerSender[myid=13]] - will send: leader = 3, zxid = 558362464215,
electionEpoch = 5, state = OBSERVING, sid = 11, peerEpoch = 130, configData = [B@3a7fb92d
2013-11-02 02:33:21,341 - INFO  [WorkerSender[myid=13]] - will send: leader = 3, zxid = 558362464215,
electionEpoch = 5, state = OBSERVING, sid = 14, peerEpoch = 130, configData = [B@1756575c
2013-11-02 02:33:21,341 - INFO  [WorkerSender[myid=13]] - will send: leader = 3, zxid = 558362464215,
electionEpoch = 5, state = OBSERVING, sid = 13, peerEpoch = 130, configData = [B@258164fc
{noformat}

As you can see, it's sending tons of notifications per second. Not good :)

With this diff in FastLeaderElection.java (i.e.: a revert of part of your change):

{noformat}
     private void sendNotifications() {
-        for (long sid : self.getAllKnownServerIds()) {
+        for (QuorumServer server : self.getVotingView().values()) {
+            long sid = server.id;
{noformat}

observers, of course, don't get spammed. I am guessing some condition is failing for Observers
that assumes the notifications are fresh and sends them repeatedly?

> Observers spam each other creating connections to the election addr
> -------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1807
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
>             Project: ZooKeeper
>          Issue Type: Bug
>            Reporter: Raul Gutierrez Segales
>            Assignee: Raul Gutierrez Segales
>             Fix For: 3.5.0
>
>
> Hey [~shralex],
> I noticed today that my Observers are spamming each other trying to open connections
to the election port. I've got tons of these:
> {noformat}
> 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already
for server 9
> 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already
for server 10
> 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already
for server 6
> 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already
for server 12
> 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already
for server 14
> {noformat}
> and so and so on ad nauseam. 
> Now, looking around I found this inside FastLeaderElection.java from when you committed
ZOOKEEPER-107:
> {noformat}
>      private void sendNotifications() {
> -        for (QuorumServer server : self.getVotingView().values()) {
> -            long sid = server.id;
> -
> +        for (long sid : self.getAllKnownServerIds()) {
> +            QuorumVerifier qv = self.getQuorumVerifier();
> {noformat}
> Is that really desired? I suspect that is what's causing Observers to try to connect
to each other (as opposed as just connecting to participants). I'll give it a try now and
let you know. (Also, we use observer ids that are > 0, and I saw some parts of the code
that might not deal with that assumption - so it could be that too..). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message