zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brian Nixon (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ZOOKEEPER-3394) Delay observer reconnect when all learner masters have been tried
Date Tue, 14 May 2019 03:52:00 GMT
Brian Nixon created ZOOKEEPER-3394:

             Summary: Delay observer reconnect when all learner masters have been tried
                 Key: ZOOKEEPER-3394
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3394
             Project: ZooKeeper
          Issue Type: Improvement
          Components: quorum
    Affects Versions: 3.6.0
            Reporter: Brian Nixon

Observers will disconnect when the voting peers perform a leader election and reconnect after.
The delay zookeeper.observer.reconnectDelayMs was added to insulate the voting peers from
the observers returning. With a large number of peers and the observerMaster feature active,
this delay is mostly detrimental as it means that the observer is more likely to get hung
up on connecting to a bad (down/corrupt) peer and it would be better off switching to a new
one quickly.

To retain the protective virtue of the delay, it makes sense to add a delay that after all
observer master's in the list have been tried before iterating through the list again. In
the case where observer master's are not active, this degenerates to a delay between connection
attempts on the leader.

This message was sent by Atlassian JIRA

View raw message