zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Junqueira <...@apache.org>
Subject Re: ZooKeeperServer#shutdown hangs
Date Wed, 16 Dec 2015 22:59:11 GMT
This is bad, we should fix it and release 3.4.8 soon. With the holidays and such, we won't
be able to produce an RC and vote, so I suggest we target early Jan. In the meanwhile, I'd
suggest users to not move to 3.4.7. 

I've reopened ZK-1907 and suggested a fix to this problem.

-Flavio 

 
> On 16 Dec 2015, at 21:01, Ted Yu <yuzhihong@gmail.com> wrote:
> 
> Logged ZOOKEEPER-2347
> 
> Thanks
> 
> On Wed, Dec 16, 2015 at 12:36 PM, Camille Fournier <camille@apache.org>
> wrote:
> 
>> Blergh. We made shutdown synchronized. But decrementing the requests is
>> also synchronized and called from a different thread. So yeah, deadlock.
>> 
>> Can you open a ticket for this? This came in with ZOOKEEPER-1907
>> 
>> C
>> 
>> On Wed, Dec 16, 2015 at 2:46 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>> 
>>> Hi,
>>> HBase recently upgraded to zookeeper 3.4.7
>>> 
>>> In one of the tests, TestSplitLogManager, there is reproducible hang at
>> the
>>> end of the test.
>>> Below is snippet from stack trace related to zookeeper:
>>> 
>>> "main-EventThread" daemon prio=5 tid=0x00007fd27488a800 nid=0x6f1f
>> waiting
>>> on condition [0x000000011834b000]
>>>   java.lang.Thread.State: WAITING (parking)
>>>  at sun.misc.Unsafe.park(Native Method)
>>>  - parking to wait for  <0x00000007c5b8d3a0> (a
>>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>>>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>>>  at
>>> 
>>> 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>>>  at
>>> 
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>>>  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
>>> 
>>> "main-SendThread(localhost:59510)" daemon prio=5 tid=0x00007fd274eb4000
>>> nid=0x9513 waiting on condition [0x0000000118042000]
>>>   java.lang.Thread.State: TIMED_WAITING (sleeping)
>>>  at java.lang.Thread.sleep(Native Method)
>>>  at
>>> 
>>> 
>> org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:101)
>>>  at
>>> 
>>> 
>> org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:997)
>>>  at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)
>>> 
>>> "SyncThread:0" prio=5 tid=0x00007fd274d02000 nid=0x730f waiting for
>> monitor
>>> entry [0x00000001170ac000]
>>>   java.lang.Thread.State: BLOCKED (on object monitor)
>>>  at
>>> 
>>> 
>> org.apache.zookeeper.server.ZooKeeperServer.decInProcess(ZooKeeperServer.java:512)
>>>  - waiting to lock <0x00000007c5b62128> (a
>>> org.apache.zookeeper.server.ZooKeeperServer)
>>>  at
>>> 
>>> 
>> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:144)
>>>  at
>>> 
>>> 
>> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200)
>>>  at
>>> 
>>> 
>> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131)
>>> 
>>> "main-EventThread" daemon prio=5 tid=0x00007fd2753a3800 nid=0x711b
>> waiting
>>> on condition [0x0000000117a30000]
>>>   java.lang.Thread.State: WAITING (parking)
>>>  at sun.misc.Unsafe.park(Native Method)
>>>  - parking to wait for  <0x00000007c9b106b8> (a
>>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>>>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>>>  at
>>> 
>>> 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>>>  at
>>> 
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>>>  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
>>> 
>>> "main" prio=5 tid=0x00007fd276000000 nid=0x1903 in Object.wait()
>>> [0x0000000108aa1000]
>>>   java.lang.Thread.State: WAITING (on object monitor)
>>>  at java.lang.Object.wait(Native Method)
>>>  - waiting on <*0x00000007c5b66400*> (a
>>> org.apache.zookeeper.server.SyncRequestProcessor)
>>>  at java.lang.Thread.join(Thread.java:1281)
>>>  - locked <*0x00000007c5b66400*> (a
>>> org.apache.zookeeper.server.SyncRequestProcessor)
>>>  at java.lang.Thread.join(Thread.java:1355)
>>>  at
>>> 
>>> 
>> org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:213)
>>>  at
>>> 
>>> 
>> org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:770)
>>>  at
>>> 
>>> 
>> org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:478)
>>>  - locked <0x00000007c5b62128> (a
>>> org.apache.zookeeper.server.ZooKeeperServer)
>>>  at
>>> 
>>> 
>> org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:266)
>>>  at
>>> 
>>> 
>> org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.shutdown(MiniZooKeeperCluster.java:301)
>>> 
>>> Note the bold address in the last hunk which seems to indicate some form
>> of
>>> deadlock.
>>> 
>>> I can send the full stack trace upon request.
>>> When reverting to 3.4.6, the test passes.
>>> 
>>> Comment / hint is welcome.
>>> 
>>> Cheers
>>> 
>> 


Mime
View raw message