kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manikumar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-6549) Deadlock while processing Controller Events
Date Mon, 12 Feb 2018 15:28:00 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-6549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16360879#comment-16360879
] 

Manikumar commented on KAFKA-6549:
----------------------------------

Below are the sequence of steps which led to deadlock situation. This is rare scenario. Will
try to come up with the solution.
 # Both controller ZK node deletion and ZK Session expiration events happens in a quick succession.
 # For controller ZK node deletion, ControllerChangeHandler.handleDeletion() is called. Controller.Reelect
event is added to queue and Reelect process is initiated.
 # For ZK Session expiration, ZooKeeperClientWatcher takes WriteLock and calls Controller
StateChangeHandler.beforeInitializingSession(). beforeInitializingSession method adds expire
event to controller queue and waits for expire event completion.
 # Controller.Reelect waits for ZooKeeperClient.ReadLock
 # ZooKeeperClientWatcher waits for Expire event completion, which intern waits inside controller
queue.

> Deadlock while processing Controller Events
> -------------------------------------------
>
>                 Key: KAFKA-6549
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6549
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>            Reporter: Manikumar
>            Assignee: Manikumar
>            Priority: Blocker
>             Fix For: 1.1.0
>
>         Attachments: td.txt
>
>
> Stack traces from a single node test cluster that was deadlocked while processing controller
Reelect and Expire events. Attached stack-trace.
> {quote}
> "main-EventThread" #18 daemon prio=5 os_prio=31 tid=0x00007f83e4285800 nid=0x7d03 waiting
on condition [0x000070000278b000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x00000007bccadf30> (a java.util.concurrent.CountDownLatch$Sync)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>  at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>  at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>  at kafka.controller.KafkaController$Expire.waitUntilProcessed(KafkaController.scala:1505)
>  at kafka.controller.KafkaController$$anon$7.beforeInitializingSession(KafkaController.scala:163)
>  at kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$$anonfun$process$2$$anonfun$apply$mcV$sp$6.apply(ZooKeeperClient.scala:365)
>  at kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$$anonfun$process$2$$anonfun$apply$mcV$sp$6.apply(ZooKeeperClient.scala:365)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891)
>  at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
>  at scala.collection.MapLike$DefaultValuesIterable.foreach(MapLike.scala:206)
>  at kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$$anonfun$process$2.apply$mcV$sp(ZooKeeperClient.scala:365)
>  at kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$$anonfun$process$2.apply(ZooKeeperClient.scala:363)
>  at kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$$anonfun$process$2.apply(ZooKeeperClient.scala:363)
>  at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:250)
>  at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:258)
>  at kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$.process(ZooKeeperClient.scala:363)
>  at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:531)
>  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
> Locked ownable synchronizers:
>  - <0x0000000780054860> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>  
> "controller-event-thread" #42 prio=5 os_prio=31 tid=0x00007f83e4293800 nid=0xad03 waiting
on condition [0x0000700003fd3000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x00000007bcc584a0> (a java.util.concurrent.CountDownLatch$Sync)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>  at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>  at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>  at kafka.zookeeper.ZooKeeperClient.handleRequests(ZooKeeperClient.scala:148)
>  at kafka.zk.KafkaZkClient.retryRequestsUntilConnected(KafkaZkClient.scala:1439)
>  at kafka.zk.KafkaZkClient.kafka$zk$KafkaZkClient$$retryRequestUntilConnected(KafkaZkClient.scala:1432)
>  at kafka.zk.KafkaZkClient.registerZNodeChangeHandlerAndCheckExistence(KafkaZkClient.scala:1171)
>  at kafka.controller.KafkaController$Reelect$.process(KafkaController.scala:1475)
>  at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:69)
>  at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:69)
>  at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:69)
>  at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
>  at kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:68)
>  at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
> "kafka-shutdown-hook" #14 prio=5 os_prio=31 tid=0x00007f83e29b1000 nid=0x560f waiting
on condition [0x0000700005208000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x0000000780054860> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
>  at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
>  at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
>  at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:248)
>  at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:256)
>  at kafka.zookeeper.ZooKeeperClient$$anonfun$handleRequests$1.apply(ZooKeeperClient.scala:135)
>  at kafka.zookeeper.ZooKeeperClient$$anonfun$handleRequests$1.apply(ZooKeeperClient.scala:132)
>  at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>  at kafka.zookeeper.ZooKeeperClient.handleRequests(ZooKeeperClient.scala:132)
>  at kafka.zk.KafkaZkClient.retryRequestsUntilConnected(KafkaZkClient.scala:1439)
>  at kafka.zk.KafkaZkClient.kafka$zk$KafkaZkClient$$retryRequestUntilConnected(KafkaZkClient.scala:1432)
>  at kafka.zk.KafkaZkClient.getControllerId(KafkaZkClient.scala:862)
>  at kafka.server.KafkaServer.doControlledShutdown$1(KafkaServer.scala:458)
>  at kafka.server.KafkaServer.kafka$server$KafkaServer$$controlledShutdown(KafkaServer.scala:534)
>  at kafka.server.KafkaServer$$anonfun$shutdown$1.apply$mcV$sp(KafkaServer.scala:556)
>  at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:85)
>  at kafka.server.KafkaServer.shutdown(KafkaServer.scala:556)
>  at kafka.server.KafkaServerStartable.shutdown(KafkaServerStartable.scala:48)
>  at kafka.Kafka$$anon$1.run(Kafka.scala:89)
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message