zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vitalii Tymchyshyn <...@tym.im>
Subject Re: High CPU usage on zookeeper clients when cluster is down
Date Fri, 20 Jun 2014 02:09:55 GMT
I'd say that some randomness added here would help. E.g. to use 700-1300 ms
instead of hard coded one second.


2014-06-19 18:14 GMT-04:00 Luke Stephenson <luke.stephenson@gmail.com>:

> Hello,
>
> I'm running approximately 20 java processes on one host.  Each process
> connects to zookeeper, but places very little load on zookeeper.  The
> zookeeper cluster consists of 9 nodes.
>
> When the zookeeper cluster is healthy, all is well.  However when the
> zookeeper cluster goes down, the clients create significant load on the
> host as they attempt to reconnect to zookeeper.
>
> Each zookeeper client attempts to connect to each of the 9 nodes listed in
> the zookeeper cluster, in succession.  If the connection fails to all hosts
> it will wait 1 second before trying again.  So every second I've got 180
> attempted connections on one host.  I already had a problem with the
> zookeeper cluster being down, now the clients are creating excessive load
> as well compounding the issue.
>
> This is the code which I've narrowed it down to.  Unfortunately the 1
> second delay between attempts is hard coded.
>
> https://github.com/apache/zookeeper/blob/release-3.4.6/src/java/main/org/apache/zookeeper/ClientCnxn.java#L940
>         private void startConnect() throws IOException {
>             state = States.CONNECTING;
>
>             InetSocketAddress addr;
>             if (rwServerAddress != null) {
>                 addr = rwServerAddress;
>                 rwServerAddress = null;
>             } else {
>                 addr = hostProvider.next(1000);
>             }
>
> Is the typical pattern to use a load balancer so that the client only
> specifies one endpoint and as a result only attempts to establish 1
> connection per second?  Any other recommendations?
>
> I would have thought this was a common problem, but my searches failed to
> find existing discussions on it.
>
> Thanks
>
> Luke
>
> PS Apologies if you have received this twice.  I initially published from
> nabble which appears to have failed.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message