zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Rosenberg <...@squareup.com>
Subject Re: ZooKeeperServer#shutdown hangs
Date Thu, 17 Dec 2015 23:33:21 GMT
Curious if there are specific scenarios which trigger this issue.  So far
we have not seen it where we've upgraded.  We have many tests in continuous
integration that embed zookeeper servers, and so far haven't seen any
issues.

Jason

On Wed, Dec 16, 2015 at 6:01 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> Thanks, Flavio.
>
> When 3.4.8 RC comes out, I will give it a spin.
>
> Cheers
>
> On Wed, Dec 16, 2015 at 2:59 PM, Flavio Junqueira <fpj@apache.org> wrote:
>
> > This is bad, we should fix it and release 3.4.8 soon. With the holidays
> > and such, we won't be able to produce an RC and vote, so I suggest we
> > target early Jan. In the meanwhile, I'd suggest users to not move to
> 3.4.7.
> >
> > I've reopened ZK-1907 and suggested a fix to this problem.
> >
> > -Flavio
> >
> >
> > > On 16 Dec 2015, at 21:01, Ted Yu <yuzhihong@gmail.com> wrote:
> > >
> > > Logged ZOOKEEPER-2347
> > >
> > > Thanks
> > >
> > > On Wed, Dec 16, 2015 at 12:36 PM, Camille Fournier <camille@apache.org
> >
> > > wrote:
> > >
> > >> Blergh. We made shutdown synchronized. But decrementing the requests
> is
> > >> also synchronized and called from a different thread. So yeah,
> deadlock.
> > >>
> > >> Can you open a ticket for this? This came in with ZOOKEEPER-1907
> > >>
> > >> C
> > >>
> > >> On Wed, Dec 16, 2015 at 2:46 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> > >>
> > >>> Hi,
> > >>> HBase recently upgraded to zookeeper 3.4.7
> > >>>
> > >>> In one of the tests, TestSplitLogManager, there is reproducible hang
> at
> > >> the
> > >>> end of the test.
> > >>> Below is snippet from stack trace related to zookeeper:
> > >>>
> > >>> "main-EventThread" daemon prio=5 tid=0x00007fd27488a800 nid=0x6f1f
> > >> waiting
> > >>> on condition [0x000000011834b000]
> > >>>   java.lang.Thread.State: WAITING (parking)
> > >>>  at sun.misc.Unsafe.park(Native Method)
> > >>>  - parking to wait for  <0x00000007c5b8d3a0> (a
> > >>>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> > >>>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> > >>>  at
> > >>>
> > >>>
> > >>
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
> > >>>  at
> > >>>
> > >>
> >
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> > >>>  at
> > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> > >>>
> > >>> "main-SendThread(localhost:59510)" daemon prio=5
> tid=0x00007fd274eb4000
> > >>> nid=0x9513 waiting on condition [0x0000000118042000]
> > >>>   java.lang.Thread.State: TIMED_WAITING (sleeping)
> > >>>  at java.lang.Thread.sleep(Native Method)
> > >>>  at
> > >>>
> > >>>
> > >>
> >
> org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:101)
> > >>>  at
> > >>>
> > >>>
> > >>
> >
> org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:997)
> > >>>  at
> > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)
> > >>>
> > >>> "SyncThread:0" prio=5 tid=0x00007fd274d02000 nid=0x730f waiting for
> > >> monitor
> > >>> entry [0x00000001170ac000]
> > >>>   java.lang.Thread.State: BLOCKED (on object monitor)
> > >>>  at
> > >>>
> > >>>
> > >>
> >
> org.apache.zookeeper.server.ZooKeeperServer.decInProcess(ZooKeeperServer.java:512)
> > >>>  - waiting to lock <0x00000007c5b62128> (a
> > >>> org.apache.zookeeper.server.ZooKeeperServer)
> > >>>  at
> > >>>
> > >>>
> > >>
> >
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:144)
> > >>>  at
> > >>>
> > >>>
> > >>
> >
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200)
> > >>>  at
> > >>>
> > >>>
> > >>
> >
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131)
> > >>>
> > >>> "main-EventThread" daemon prio=5 tid=0x00007fd2753a3800 nid=0x711b
> > >> waiting
> > >>> on condition [0x0000000117a30000]
> > >>>   java.lang.Thread.State: WAITING (parking)
> > >>>  at sun.misc.Unsafe.park(Native Method)
> > >>>  - parking to wait for  <0x00000007c9b106b8> (a
> > >>>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> > >>>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> > >>>  at
> > >>>
> > >>>
> > >>
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
> > >>>  at
> > >>>
> > >>
> >
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> > >>>  at
> > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> > >>>
> > >>> "main" prio=5 tid=0x00007fd276000000 nid=0x1903 in Object.wait()
> > >>> [0x0000000108aa1000]
> > >>>   java.lang.Thread.State: WAITING (on object monitor)
> > >>>  at java.lang.Object.wait(Native Method)
> > >>>  - waiting on <*0x00000007c5b66400*> (a
> > >>> org.apache.zookeeper.server.SyncRequestProcessor)
> > >>>  at java.lang.Thread.join(Thread.java:1281)
> > >>>  - locked <*0x00000007c5b66400*> (a
> > >>> org.apache.zookeeper.server.SyncRequestProcessor)
> > >>>  at java.lang.Thread.join(Thread.java:1355)
> > >>>  at
> > >>>
> > >>>
> > >>
> >
> org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:213)
> > >>>  at
> > >>>
> > >>>
> > >>
> >
> org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:770)
> > >>>  at
> > >>>
> > >>>
> > >>
> >
> org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:478)
> > >>>  - locked <0x00000007c5b62128> (a
> > >>> org.apache.zookeeper.server.ZooKeeperServer)
> > >>>  at
> > >>>
> > >>>
> > >>
> >
> org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:266)
> > >>>  at
> > >>>
> > >>>
> > >>
> >
> org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.shutdown(MiniZooKeeperCluster.java:301)
> > >>>
> > >>> Note the bold address in the last hunk which seems to indicate some
> > form
> > >> of
> > >>> deadlock.
> > >>>
> > >>> I can send the full stack trace upon request.
> > >>> When reverting to 3.4.6, the test passes.
> > >>>
> > >>> Comment / hint is welcome.
> > >>>
> > >>> Cheers
> > >>>
> > >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message