zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stephelu <luke.stephen...@gmail.com>
Subject High CPU usage on zookeeper clients when cluster is down
Date Thu, 19 Jun 2014 08:53:23 GMT
Hello,

I'm running approximately 20 java processes on one host.  Each process
connects to zookeeper, but places very little load on zookeeper.  The
zookeeper cluster consists of 9 nodes.

When the zookeeper cluster is healthy, all is well.  However when the
zookeeper cluster goes down, the clients create significant load on the host
as they attempt to reconnect to zookeeper.

Each zookeeper client attempts to connect to each of the 9 nodes listed in
the zookeeper cluster, in succession.  If the connection fails to all hosts
it will wait 1 second before trying again.  So every second I've got 180
attempted connections on one host.  I already had a problem with the
zookeeper cluster being down, now the clients are creating excessive load as
well compounding the issue.

This is the code which I've narrowed it down to.  Unfortunately the 1 second
delay between attempts is hard coded.  
https://github.com/apache/zookeeper/blob/release-3.4.6/src/java/main/org/apache/zookeeper/ClientCnxn.java#L940
        private void startConnect() throws IOException {
            state = States.CONNECTING;

            InetSocketAddress addr;
            if (rwServerAddress != null) {
                addr = rwServerAddress;
                rwServerAddress = null;
            } else {
                addr = hostProvider.next(1000);
            }

Is the typical pattern to use a load balancer so that the client only
specifies one endpoint and as a result only attempts to establish 1
connection per second?  Any other recommendations?

I would have thought this was a common problem, but my searches failed to
find existing discussions on it.

Thanks

Luke



--
View this message in context: http://zookeeper-user.578899.n2.nabble.com/High-CPU-usage-on-zookeeper-clients-when-cluster-is-down-tp7580027.html
Sent from the zookeeper-user mailing list archive at Nabble.com.

Mime
View raw message