zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Junqueira <...@apache.org>
Subject Re: Tickling the election ports
Date Sat, 03 Jun 2017 14:48:00 GMT
Hi Ben,

To your points:

> On 02 Jun 2017, at 23:46, Ben Sherman <bensherman@gmail.com> wrote:
> Hi all,
> Regarding my recent outages, I have a suspicion that there is some stateful
> connection tracking happening between my servers that is invisible to me.
> (In this case, it's across availability zones in AWS VPCs).
> This has come up in both a JIRA ticket at
> https://issues.apache.org/jira/browse/ZOOKEEPER-1748 and a PR in the git
> repo at https://github.com/apache/zookeeper/pull/83
> I believe that when an enseble is started that there are connections setup
> between each server on port 3888 (among others). As the server is normally
> healthy, there is no traffic across that connection beyond the initial
> election. At some point with no traffic, the black box NAT device removes
> it from the state table but does not send a FIN or RST down the pipe, but
> the service thinks the connection still exists. During a failure, ZK will
> attempt to send traffic down said pipe during a new election, but it won't
> work, and will have to wait for the system timeouts to kill the connection.
> Am I correct in the following assumptions:
> 1. When an ensemble is healthy, no traffic goes across the election ports.

Yes, no election notifications are sent.

> 2. There is no way to trigger traffic across those ports (four letter
> command or otherwise) without causing a failure in the ensemble.

I'm afraid not. In fact, a single failure doesn't necessarily induces traffic in all connections
unless you hit the leader.

> 3. I can cause traffic on those ports across the entire ensemble should I
> restart any node in the ensemble.

Not really, the only way to induce traffic on all connections is to hit the leader. If you
crash a follower and the leader
still has a quorum of followers, then you won't have any notification sent. If you bring that
serve back up, there will be
some notifications, but it won't be all to all, only from the server to the rest of the ensemble.

> Finally, is there any way to shine any light on the above issues that
> highlight this? I have considered forking 3.4.10 to do this, but the
> overhead required is more than I can afford right now going down the line.

I'm not sure I understand the question, why do you want to fork?


View raw message