zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Junqueira <...@apache.org>
Subject Re: Working around Leader election Listner thread death
Date Sun, 28 Aug 2016 14:52:05 GMT
Hi Guy,

We don't have a way to restart the listener thread, so you really need to bounce the server.
I don't think there is a way of doing this without forcing a leader election, assuming all
your servers are in this bad state. To minimize downtime, one thing you can do is to avoid
bouncing the current leader until it loses quorum support. Once it loses quorum support, you
have a quorum of healthy servers and they will elect a new, healthy leader. At the point,
you can bounce all your unhealthy servers.

You may also want to move to a later 3.4 release.

> On 24 Aug 2016, at 23:15, Guy Laden <guy.laden@gmail.com> wrote:
> Hi all,
> It looks like due to a security scan sending "bad" traffic to the leader
> election port, we have clusters in which
> the leader election Listener thread is dead (unchecked exception was thrown
> and thread died - seen in the log).
> (This seems to be fixed by fixed in
> https://issues.apache.org/jira/browse/ZOOKEEPER-2186)
> In this state, when a healthy server comes up and tries to connecnt to the
> quorum, it gets stuck on
> the leader election. It establishes TCP connections to the other servers
> but any traffic it sends seems
> to get stuck in the receiver's TCP Recv queue (seen with netstat), and is
> not read/processed by zk.
> Not a good place to be :)
> This is with 3.4.6
> Is there a way to get such clusters back to a healthy state without loss of
> quorum / client impact?
> Some way of re-starting the listener thread? or restarting the servers in a
> certain order?
> e.g. If I restart a minority, say the ones with lower server id's - is
> there a way to get the majority servers
> to re-initiate leader election connections with them so as to connect them
> to the quorum? (and to do this without
> the majority losing quorum).
> Thanks,
> Guy

View raw message