zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Rosenberg <...@squareup.com>
Subject Re: ZooKeeperServer#shutdown hangs
Date Fri, 18 Dec 2015 00:08:00 GMT
Yep,

I'm able to reproduce it now intermittently (but not high percentage of the
time) in some of our tests.....I'm reverting.

Thanks,

Jason

On Thu, Dec 17, 2015 at 6:39 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> Jason:
> See the following test which revealed the deadlock scenario:
>
>
> https://github.com/apache/hbase/blob/master/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java
>
> On Jenkins, hbase build has been flaky where sometimes the above test hung
> but sometimes it passed.
>
> I tend to think that this bug should be fixed for production system.
>
> Cheers
>
> On Thu, Dec 17, 2015 at 3:33 PM, Jason Rosenberg <jbr@squareup.com> wrote:
>
> > Curious if there are specific scenarios which trigger this issue.  So far
> > we have not seen it where we've upgraded.  We have many tests in
> continuous
> > integration that embed zookeeper servers, and so far haven't seen any
> > issues.
> >
> > Jason
> >
> > On Wed, Dec 16, 2015 at 6:01 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> >
> > > Thanks, Flavio.
> > >
> > > When 3.4.8 RC comes out, I will give it a spin.
> > >
> > > Cheers
> > >
> > > On Wed, Dec 16, 2015 at 2:59 PM, Flavio Junqueira <fpj@apache.org>
> > wrote:
> > >
> > > > This is bad, we should fix it and release 3.4.8 soon. With the
> holidays
> > > > and such, we won't be able to produce an RC and vote, so I suggest we
> > > > target early Jan. In the meanwhile, I'd suggest users to not move to
> > > 3.4.7.
> > > >
> > > > I've reopened ZK-1907 and suggested a fix to this problem.
> > > >
> > > > -Flavio
> > > >
> > > >
> > > > > On 16 Dec 2015, at 21:01, Ted Yu <yuzhihong@gmail.com> wrote:
> > > > >
> > > > > Logged ZOOKEEPER-2347
> > > > >
> > > > > Thanks
> > > > >
> > > > > On Wed, Dec 16, 2015 at 12:36 PM, Camille Fournier <
> > camille@apache.org
> > > >
> > > > > wrote:
> > > > >
> > > > >> Blergh. We made shutdown synchronized. But decrementing the
> requests
> > > is
> > > > >> also synchronized and called from a different thread. So yeah,
> > > deadlock.
> > > > >>
> > > > >> Can you open a ticket for this? This came in with ZOOKEEPER-1907
> > > > >>
> > > > >> C
> > > > >>
> > > > >> On Wed, Dec 16, 2015 at 2:46 PM, Ted Yu <yuzhihong@gmail.com>
> > wrote:
> > > > >>
> > > > >>> Hi,
> > > > >>> HBase recently upgraded to zookeeper 3.4.7
> > > > >>>
> > > > >>> In one of the tests, TestSplitLogManager, there is reproducible
> > hang
> > > at
> > > > >> the
> > > > >>> end of the test.
> > > > >>> Below is snippet from stack trace related to zookeeper:
> > > > >>>
> > > > >>> "main-EventThread" daemon prio=5 tid=0x00007fd27488a800
> nid=0x6f1f
> > > > >> waiting
> > > > >>> on condition [0x000000011834b000]
> > > > >>>   java.lang.Thread.State: WAITING (parking)
> > > > >>>  at sun.misc.Unsafe.park(Native Method)
> > > > >>>  - parking to wait for  <0x00000007c5b8d3a0> (a
> > > > >>>
> > > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> > > > >>>  at
> > java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> > > > >>>  at
> > > > >>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
> > > > >>>  at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> > > > >>>  at
> > > > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> > > > >>>
> > > > >>> "main-SendThread(localhost:59510)" daemon prio=5
> > > tid=0x00007fd274eb4000
> > > > >>> nid=0x9513 waiting on condition [0x0000000118042000]
> > > > >>>   java.lang.Thread.State: TIMED_WAITING (sleeping)
> > > > >>>  at java.lang.Thread.sleep(Native Method)
> > > > >>>  at
> > > > >>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:101)
> > > > >>>  at
> > > > >>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:997)
> > > > >>>  at
> > > > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)
> > > > >>>
> > > > >>> "SyncThread:0" prio=5 tid=0x00007fd274d02000 nid=0x730f waiting
> for
> > > > >> monitor
> > > > >>> entry [0x00000001170ac000]
> > > > >>>   java.lang.Thread.State: BLOCKED (on object monitor)
> > > > >>>  at
> > > > >>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> org.apache.zookeeper.server.ZooKeeperServer.decInProcess(ZooKeeperServer.java:512)
> > > > >>>  - waiting to lock <0x00000007c5b62128> (a
> > > > >>> org.apache.zookeeper.server.ZooKeeperServer)
> > > > >>>  at
> > > > >>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:144)
> > > > >>>  at
> > > > >>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200)
> > > > >>>  at
> > > > >>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131)
> > > > >>>
> > > > >>> "main-EventThread" daemon prio=5 tid=0x00007fd2753a3800
> nid=0x711b
> > > > >> waiting
> > > > >>> on condition [0x0000000117a30000]
> > > > >>>   java.lang.Thread.State: WAITING (parking)
> > > > >>>  at sun.misc.Unsafe.park(Native Method)
> > > > >>>  - parking to wait for  <0x00000007c9b106b8> (a
> > > > >>>
> > > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> > > > >>>  at
> > java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> > > > >>>  at
> > > > >>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
> > > > >>>  at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> > > > >>>  at
> > > > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> > > > >>>
> > > > >>> "main" prio=5 tid=0x00007fd276000000 nid=0x1903 in Object.wait()
> > > > >>> [0x0000000108aa1000]
> > > > >>>   java.lang.Thread.State: WAITING (on object monitor)
> > > > >>>  at java.lang.Object.wait(Native Method)
> > > > >>>  - waiting on <*0x00000007c5b66400*> (a
> > > > >>> org.apache.zookeeper.server.SyncRequestProcessor)
> > > > >>>  at java.lang.Thread.join(Thread.java:1281)
> > > > >>>  - locked <*0x00000007c5b66400*> (a
> > > > >>> org.apache.zookeeper.server.SyncRequestProcessor)
> > > > >>>  at java.lang.Thread.join(Thread.java:1355)
> > > > >>>  at
> > > > >>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:213)
> > > > >>>  at
> > > > >>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:770)
> > > > >>>  at
> > > > >>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:478)
> > > > >>>  - locked <0x00000007c5b62128> (a
> > > > >>> org.apache.zookeeper.server.ZooKeeperServer)
> > > > >>>  at
> > > > >>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:266)
> > > > >>>  at
> > > > >>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.shutdown(MiniZooKeeperCluster.java:301)
> > > > >>>
> > > > >>> Note the bold address in the last hunk which seems to indicate
> some
> > > > form
> > > > >> of
> > > > >>> deadlock.
> > > > >>>
> > > > >>> I can send the full stack trace upon request.
> > > > >>> When reverting to 3.4.6, the test passes.
> > > > >>>
> > > > >>> Comment / hint is welcome.
> > > > >>>
> > > > >>> Cheers
> > > > >>>
> > > > >>
> > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message