zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexander Shraer (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
Date Wed, 06 Nov 2013 01:18:19 GMT

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alexander Shraer updated ZOOKEEPER-1807:
----------------------------------------

    Attachment: ZOOKEEPER-1807-ver2.patch

I was thinking about this some more but couldn't come up with a scenario that would require
more than just sending notifications to the participants of the *current* view during FLE,
like it was before ZK-107, so I'm reverting this change for now in the attached patch. During
the process I added two tests that involve observers and reconfigurations, and found a small
NullPointer bug in QuorumPeer which is also fixed here. 

Note that a leader would still have to contact both old and new view to commit a reconfig
when it comes up, so its important that new observer/participants know who the chosen leader
is, and I think they will know because we start them with a configuration containing the previous
one and themselves, so they will initiate a connection to previous config servers and get
FLE responses with leader info.

The change made in ZK-107 quoted in this Jira had an issue that we're still only waiting for
a quorum of the current view before terminating FLE, so all the extra messages may have just
as well be lost or never sent... So either we wait for both old and new quorums in FLE or
send just to old servers like before. 

Also, notice that the configuration being sent is the committed one, not the proposed one.
So if a server A is a participant in the new view (maybe it was an observer in the old view
but it doesn't matter), then anyone able to convince it that its a participant has already
adopted the new config (knows that it was committed) and so sees A as a participant and will
send a message to A even if it just sends the notification to all participants of its current
view.

Not complete sure I have this right, so if you think that I'm wrong please let me know. [~breed],
[~fpj] 

> Observers spam each other creating connections to the election addr
> -------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1807
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
>             Project: ZooKeeper
>          Issue Type: Bug
>            Reporter: Raul Gutierrez Segales
>            Assignee: Alexander Shraer
>             Fix For: 3.5.0
>
>         Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, ZOOKEEPER-1807.patch,
notifications-loop.png
>
>
> Hey [~shralex],
> I noticed today that my Observers are spamming each other trying to open connections
to the election port. I've got tons of these:
> {noformat}
> 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already
for server 9
> 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already
for server 10
> 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already
for server 6
> 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already
for server 12
> 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already
for server 14
> {noformat}
> and so and so on ad nauseam. 
> Now, looking around I found this inside FastLeaderElection.java from when you committed
ZOOKEEPER-107:
> {noformat}
>      private void sendNotifications() {
> -        for (QuorumServer server : self.getVotingView().values()) {
> -            long sid = server.id;
> -
> +        for (long sid : self.getAllKnownServerIds()) {
> +            QuorumVerifier qv = self.getQuorumVerifier();
> {noformat}
> Is that really desired? I suspect that is what's causing Observers to try to connect
to each other (as opposed as just connecting to participants). I'll give it a try now and
let you know. (Also, we use observer ids that are > 0, and I saw some parts of the code
that might not deal with that assumption - so it could be that too..). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message