zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luke Stephenson <luke.stephen...@gmail.com>
Subject High CPU usage on zookeeper clients when cluster is down
Date Thu, 19 Jun 2014 22:14:59 GMT
Hello,

I'm running approximately 20 java processes on one host.  Each process
connects to zookeeper, but places very little load on zookeeper.  The
zookeeper cluster consists of 9 nodes.

When the zookeeper cluster is healthy, all is well.  However when the
zookeeper cluster goes down, the clients create significant load on the
host as they attempt to reconnect to zookeeper.

Each zookeeper client attempts to connect to each of the 9 nodes listed in
the zookeeper cluster, in succession.  If the connection fails to all hosts
it will wait 1 second before trying again.  So every second I've got 180
attempted connections on one host.  I already had a problem with the
zookeeper cluster being down, now the clients are creating excessive load
as well compounding the issue.

This is the code which I've narrowed it down to.  Unfortunately the 1
second delay between attempts is hard coded.
https://github.com/apache/zookeeper/blob/release-3.4.6/src/java/main/org/apache/zookeeper/ClientCnxn.java#L940
        private void startConnect() throws IOException {
            state = States.CONNECTING;

            InetSocketAddress addr;
            if (rwServerAddress != null) {
                addr = rwServerAddress;
                rwServerAddress = null;
            } else {
                addr = hostProvider.next(1000);
            }

Is the typical pattern to use a load balancer so that the client only
specifies one endpoint and as a result only attempts to establish 1
connection per second?  Any other recommendations?

I would have thought this was a common problem, but my searches failed to
find existing discussions on it.

Thanks

Luke

PS Apologies if you have received this twice.  I initially published from
nabble which appears to have failed.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message