zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Lindwall <jlindw...@yahoo.com.INVALID>
Subject Re: Client hangs waiting for connection
Date Mon, 26 Jun 2017 17:11:30 GMT
The problem has been solved; closing the loop here.

ZooKeeper was behaving properly.  A configuration issue caused the code 
to try opening a connection to a zookeeper server that was not permitted 
based on a firewall.

Thanks to everyone for chiming in!
John

Abraham Fine wrote:
> Would it be possible to include the rest of the jstack, it appears that
> is just the thread waiting on the latch and doesn't tell us why the
> latch has not been counted down. Also, did ZK produce any interesting
> logs?
>
> Thanks,
> Abe
>
> On Tue, Jun 20, 2017, at 17:23, John Lindwall wrote:
>> Thanks for the reply! I forgot to include the thread dump that I have
>> collected.  This process has been hung for almost a day so I'm guessing
>> it'll never connect properly ;-)  We actually had 2 such processes hung
>> today with the same stack trace (at least the same root cause as I show
>> below).  Please note that this problem is rare but supremely not good
>> when it does happen if we fail to detect it. We've been running this
>> code for many months now and this issue has only recently occurred.
>>
>> Thread 4396: (state = BLOCKED)
>>
>> - sun.misc.Unsafe.park(boolean, long) @bci=0 (Interpreted frame)
>>
>> - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14,
>> line=186 (Interpreted frame)
>>
>> -
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt()
>> @bci=1, line=834 (Interpreted frame)
>>
>> -
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(int)
>> @bci=72, line=994 (Interpreted frame)
>>
>> -
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(int)
>> @bci=24, line=1303 (Interpreted frame)
>>
>> - java.util.concurrent.CountDownLatch.await() @bci=5, line=236
>> (Interpreted frame)
>>
>> - com.mycode.ZooKeeperFactory.connect(java.lang.String, int) @bci=34,
>> line=59 (Interpreted frame)
>> ...
>> [remainder of stack trace omitted]
>>
>> John
>>
>>
>> Michael Han wrote:
>>> Sounds like a dead lock on client library. One idea is to instrument your
>>> client code and dump the thread stack when the wait timeouts. The stack
>>> will hopefully contain the states of various threads and provide some
>>> insights on what to look for next.
>>>
>>> On Tue, Jun 20, 2017 at 3:14 PM, John Lindwall<jlindwall@yahoo.com.invalid>
>>> wrote:
>>>
>>>> We are seeing some occasional incidents where a zookeeper java client will
>>>> hang in CountDownLatch.await() while waiting for a connection to be
>>>> established.  Our connect() code is pretty standard I think and it similar
>>>> to this:
>>>>
>>>>       private ZooKeeper connect(String hosts, int sessionTimeout) throws
>>>> IOException, InterruptedException {
>>>>           final CountDownLatch connectedSignal = new CountDownLatch(1);
>>>>
>>>>           ZooKeeper zk = new ZooKeeper(hosts, sessionTimeout, new Watcher()
{
>>>>               @Override
>>>>               public void process(WatchedEvent event) {
>>>>                   if (event.getState() == Event.KeeperState.SyncConnected)
{
>>>>                       connectedSignal.countDown();
>>>>                   }
>>>>               }
>>>>           });
>>>>
>>>>           connectedSignal.await();
>>>>           return zk;
>>>>       }
>>>>
>>>> Has anyone else had an issue with the await() blocking forever like this?
>>>> Any advice?
>>>>
>>>> As a "fix" I am considering adding a timeout to the CountDownLatch await()
>>>> call; if we fail to connect within that timeout then retry the connection
>>>> attempt. After, say, 3 retries, give up entirely.
>>>>
>>>> Thanks!
>>>> --
>>>> John Lindwall
>>>>
>>>>
>> -- 
>> John Lindwall
>>

-- 
John Lindwall


Mime
View raw message