drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5749) Foreman and Netty threads occure deadlock
Date Sun, 10 Sep 2017 00:42:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16160149#comment-16160149
] 

ASF GitHub Bot commented on DRILL-5749:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/925#discussion_r137938116
  
    --- Diff: exec/rpc/src/main/java/org/apache/drill/exec/rpc/RequestIdMap.java ---
    @@ -54,10 +52,14 @@ void channelClosed(Throwable ex) {
         isOpen.set(false);
         if (ex != null) {
           final RpcException e = RpcException.mapException(ex);
    +      IntObjectHashMap<RpcOutcome<?>> clonedMap;
           synchronized (map) {
    -        map.forEach(new SetExceptionProcedure(e));
    +        clonedMap = map.clone();
             map.clear();
           }
    +      if (clonedMap != null) {
    --- End diff --
    
    When would `clonedMap` be `null`?


> Foreman and Netty threads occure deadlock 
> ------------------------------------------
>
>                 Key: DRILL-5749
>                 URL: https://issues.apache.org/jira/browse/DRILL-5749
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - RPC
>    Affects Versions: 1.10.0, 1.11.0
>            Reporter: weijie.tong
>            Priority: Critical
>
> when the cluster was in high concurrency query and the reused control connection occured
exceptoin, the foreman and netty threads both try to acquire each other's lock then deadlock
occured.  The netty thread hold the map (RequestIdMap) lock then try to acquire the ReconnectingConnection
lock to send command, while the foreman thread hold the ReconnectingConnection lock then try
to acquire the RequestIdMap lock. So the deadlock happend.
> Below is the jstack dump:
> Found one Java-level deadlock:
> =============================
> "265aa5cb-e5e2-39ed-9c2f-7658b905372e:foreman":
>   waiting to lock monitor 0x00007f935b721f48 (object 0x0000000656affc40, a org.apache.drill.exec.rpc.control.ControlConnectionManager),
>   which is held by "265aa82f-d8c1-5df0-9946-003a4990db7e:foreman"
> "265aa82f-d8c1-5df0-9946-003a4990db7e:foreman":
>   waiting to lock monitor 0x00007f90de3b9648 (object 0x00000006b524d7e8, a com.carrotsearch.hppc.IntObjectHashMap),
>   which is held by "BitServer-2"
> "BitServer-2":
>   waiting to lock monitor 0x00007f935b721f48 (object 0x0000000656affc40, a org.apache.drill.exec.rpc.control.ControlConnectionManager),
>   which is held by "265aa82f-d8c1-5df0-9946-003a4990db7e:foreman"
> Java stack information for the threads listed above:
> ===================================================
> "265aa5cb-e5e2-39ed-9c2f-7658b905372e:foreman":
> 	at org.apache.drill.exec.rpc.ReconnectingConnection.runCommand(ReconnectingConnection.java:72)
> 	- waiting to lock <0x0000000656affc40> (a org.apache.drill.exec.rpc.control.ControlConnectionManager)
> 	at org.apache.drill.exec.rpc.control.ControlTunnel.sendFragments(ControlTunnel.java:66)
> 	at org.apache.drill.exec.work.foreman.Foreman.sendRemoteFragments(Foreman.java:1210)
> 	at org.apache.drill.exec.work.foreman.Foreman.setupNonRootFragments(Foreman.java:1141)
> 	at org.apache.drill.exec.work.foreman.Foreman.runPhysicalPlan(Foreman.java:454)
> 	at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:1045)
> 	at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:274)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
> 	at java.lang.Thread.run(Thread.java:849)
> "265aa82f-d8c1-5df0-9946-003a4990db7e:foreman":
> 	at org.apache.drill.exec.rpc.RequestIdMap.createNewRpcListener(RequestIdMap.java:87)
> 	- waiting to lock <0x00000006b524d7e8> (a com.carrotsearch.hppc.IntObjectHashMap)
> 	at org.apache.drill.exec.rpc.AbstractRemoteConnection.createNewRpcListener(AbstractRemoteConnection.java:153)
> 	at org.apache.drill.exec.rpc.RpcBus.send(RpcBus.java:115)
> 	at org.apache.drill.exec.rpc.RpcBus.send(RpcBus.java:89)
> 	at org.apache.drill.exec.rpc.control.ControlConnection.send(ControlConnection.java:65)
> 	at org.apache.drill.exec.rpc.control.ControlTunnel$SendFragment.doRpcCall(ControlTunnel.java:160)
> 	at org.apache.drill.exec.rpc.control.ControlTunnel$SendFragment.doRpcCall(ControlTunnel.java:150)
> 	at org.apache.drill.exec.rpc.ListeningCommand.connectionAvailable(ListeningCommand.java:38)
> 	at org.apache.drill.exec.rpc.ReconnectingConnection.runCommand(ReconnectingConnection.java:75)
> 	- locked <0x0000000656affc40> (a org.apache.drill.exec.rpc.control.ControlConnectionManager)
> 	at org.apache.drill.exec.rpc.control.ControlTunnel.sendFragments(ControlTunnel.java:66)
> 	at org.apache.drill.exec.work.foreman.Foreman.sendRemoteFragments(Foreman.java:1210)
> 	at org.apache.drill.exec.work.foreman.Foreman.setupNonRootFragments(Foreman.java:1141)
> 	at org.apache.drill.exec.work.foreman.Foreman.runPhysicalPlan(Foreman.java:454)
> 	at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:1045)
> 	at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:274)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
> 	at java.lang.Thread.run(Thread.java:849)
> "BitServer-2":
> 	at org.apache.drill.exec.rpc.ReconnectingConnection.runCommand(ReconnectingConnection.java:72)
> 	- waiting to lock <0x0000000656affc40> (a org.apache.drill.exec.rpc.control.ControlConnectionManager)
> 	at org.apache.drill.exec.rpc.control.ControlTunnel.cancelFragment(ControlTunnel.java:71)
> 	at org.apache.drill.exec.work.foreman.QueryManager.cancelExecutingFragments(QueryManager.java:220)
> 	at org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:968)
> 	at org.apache.drill.exec.work.foreman.Foreman.access$2600(Foreman.java:109)
> 	at org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:1020)
> 	at org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:1013)
> 	at org.apache.drill.common.EventProcessor.processEvents(EventProcessor.java:107)
> 	at org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:65)
> 	at org.apache.drill.exec.work.foreman.Foreman$StateSwitch.addEvent(Foreman.java:1015)
> 	at org.apache.drill.exec.work.foreman.Foreman.addToEventQueue(Foreman.java:1033)
> 	at org.apache.drill.exec.work.foreman.Foreman$FragmentSubmitListener.failed(Foreman.java:1274)
> 	at org.apache.drill.exec.rpc.ListeningCommand$DeferredRpcOutcome.failed(ListeningCommand.java:50)
> 	at org.apache.drill.exec.rpc.RequestIdMap$RpcListener.setException(RequestIdMap.java:134)
> 	at org.apache.drill.exec.rpc.RequestIdMap$SetExceptionProcedure.apply(RequestIdMap.java:74)
> 	at org.apache.drill.exec.rpc.RequestIdMap$SetExceptionProcedure.apply(RequestIdMap.java:64)
> 	at com.carrotsearch.hppc.IntObjectHashMap.forEach(IntObjectHashMap.java:692)
> 	at org.apache.drill.exec.rpc.RequestIdMap.channelClosed(RequestIdMap.java:58)
> 	- locked <0x00000006b524d7e8> (a com.carrotsearch.hppc.IntObjectHashMap)
> 	at org.apache.drill.exec.rpc.AbstractRemoteConnection.channelClosed(AbstractRemoteConnection.java:183)
> 	at org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:165)
> 	at org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:142)
> 	at org.apache.drill.exec.rpc.ReconnectingConnection$CloseHandler.operationComplete(ReconnectingConnection.java:204)
> 	at org.apache.drill.exec.rpc.ReconnectingConnection$CloseHandler.operationComplete(ReconnectingConnection.java:191)
> 	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
> 	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567)
> 	at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:406)
> 	at io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82)
> 	at io.netty.channel.AbstractChannel$CloseFuture.setClosed(AbstractChannel.java:943)
> 	at io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0(AbstractChannel.java:592)
> 	at io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:584)
> 	at io.netty.channel.DefaultChannelPipeline$HeadContext.close(DefaultChannelPipeline.java:1099)
> 	at io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:615)
> 	at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:600)
> 	at io.netty.channel.ChannelOutboundHandlerAdapter.close(ChannelOutboundHandlerAdapter.java:71)
> 	at io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:615)
> 	at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:600)
> 	at io.netty.channel.ChannelDuplexHandler.close(ChannelDuplexHandler.java:73)
> 	at io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:615)
> 	at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:600)
> 	at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:466)
> 	at org.apache.drill.exec.rpc.RpcExceptionHandler.exceptionCaught(RpcExceptionHandler.java:39)
> 	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275)
> 	at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253)
> 	at io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131)
> 	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275)
> 	at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253)
> 	at io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131)
> 	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275)
> 	at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253)
> 	at io.netty.channel.ChannelHandlerAdapter.exceptionCaught(ChannelHandlerAdapter.java:79)
> 	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message