With my implementation of a ZK client, I see that just about all the time,
there are around 2000 open socket connections to ZK according to
netstat!!! Many of them are in the TIMED_WAIT state & will go away, but
enough get created to keep the count fairly steady. Eventually ZK gets
into a state in which I can't even connect the zkCli. On the web, I read
that one should always be prepared to retry ZK API calls because they can
fail for any number of reasons. I implemented methods for each of the ZK
calls I make that retry the operation once and this did eliminate random
ConnectionLoss KeeperExceptions I was seeing. I also implemented this
method, which is called before every ZK operation to see if I have a valid
ZK connection:
private void connectZooKeeper() {
final String methodName = "connectZooKeeper";
if (zk == null || zk.getState() != States.CONNECTED) {
if (zk != null) {
close();
}
try {
zk = new ZooKeeper(connectString, sessionTimeout, this);
int connectAttempts = 0;
while (zk.getState() != States.CONNECTED &&
connectAttempts < MAX_ZK_CONNECT_ATTEMPTS) {
try {
Thread.sleep(ZK_CONNECT_WAIT);
} catch (InterruptedException e) {
// Ignore
}
connectAttempts++;
}
} catch (IOException e) {
trace.exception(CLASS_NAME, methodName, e);
}
if (zk.getState() != States.CONNECTED) {
trace.textError(CLASS_NAME, methodName, "Unable to connect
to ZooKeeper!");
}
}
}
Here, close() simply calls ZooKeeper.close. sessionTimeout is five
seconds. MAX_ZK_ATTEMPTS is 40 and ZK_CONNECT_WAIT is 50 ms for a max of
two seconds (which I think is too short as I have seen cases in which I
traced the "Unable to connec to ZK" message).
Am I doing something poorly here that could be causing the excessively
large number of TCP connections? It would seem that getState is not
CONNECTED far more frequently than I expect, though I have not yet traced
this to confirm. (On my to-do list.)
We are using ZK 3.3.4, which is what ships with the version of Kafka we
are using. Obviously, not current. Would stepping up to the current ZK
version fix this problem?
Thanks!
Chris
|