zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Barlock <barl...@us.ibm.com>
Subject ZooKeeper TCP Port Connection Problem
Date Thu, 22 Jan 2015 18:01:55 GMT
With my implementation of a ZK client, I see that just about all the time, 
there are around 2000 open socket connections to ZK according to 
netstat!!!  Many of them are in the TIMED_WAIT state & will go away, but 
enough get created to keep the count fairly steady.  Eventually ZK gets 
into a state in which I can't even connect the zkCli.  On the web, I read 
that one should always be prepared to retry ZK API calls because they can 
fail for any number of reasons.  I implemented methods for each of the ZK 
calls I make that retry the operation once and this did eliminate random 
ConnectionLoss KeeperExceptions I was seeing.  I also implemented this 
method, which is called before every ZK operation to see if I have a valid 
ZK connection:

    private void connectZooKeeper() {
        final String methodName = "connectZooKeeper";
        if (zk == null || zk.getState() != States.CONNECTED) {
            if (zk != null) {
            try {
                zk = new ZooKeeper(connectString, sessionTimeout, this);
                int connectAttempts = 0;
                while (zk.getState() != States.CONNECTED && 
connectAttempts < MAX_ZK_CONNECT_ATTEMPTS) {
                    try {
                    } catch (InterruptedException e) {
                        // Ignore
            } catch (IOException e) {
                trace.exception(CLASS_NAME, methodName, e);
            if (zk.getState() != States.CONNECTED) {
                trace.textError(CLASS_NAME, methodName, "Unable to connect 
to ZooKeeper!"); 

Here, close() simply calls ZooKeeper.close.  sessionTimeout is five 
seconds.  MAX_ZK_ATTEMPTS is 40 and ZK_CONNECT_WAIT is 50 ms for a max of 
two seconds (which I think is too short as I have seen cases in which I 
traced the "Unable to connec to ZK" message).

Am I doing something poorly here that could be causing the excessively 
large number of TCP connections?  It would seem that getState is not 
CONNECTED far more frequently than I expect, though I have not yet traced 
this to confirm.  (On my to-do list.)

We are using ZK 3.3.4, which is what ships with the version of Kafka we 
are using.  Obviously, not current.  Would stepping up to the current ZK 
version fix this problem?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message