drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5749) Foreman and Netty threads occure deadlock
Date Tue, 29 Aug 2017 16:51:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16145655#comment-16145655
] 

ASF GitHub Bot commented on DRILL-5749:
---------------------------------------

GitHub user weijietong opened a pull request:

    https://github.com/apache/drill/pull/925

    DRILL-5749: solve foreman and netty threads deadlock

    break the nest invocation of channelClosed method to avoid nested lock holding

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/weijietong/drill DRILL-5749

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/925.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #925
    
----
commit b44f780a948c4a0898e7cee042c0590f0713f780
Author: weijietong <tongweijie178@gmail.com>
Date:   2017-06-08T08:03:46Z

    Merge pull request #1 from apache/master
    
    sync

commit d045c757c80a759b435479cc89f33c749fc16ac2
Author: weijie.tong <weijie.tong@alipay.com>
Date:   2017-08-11T08:01:36Z

    Merge branch 'master' of github.com:weijietong/drill

commit 08b7006f4c70c45a17ebf7eae6beaa2bdb0d0454
Author: weijie.tong <weijie.tong@alipay.com>
Date:   2017-08-20T12:05:51Z

    update

commit 9e9ebb497a183e61a72665019e6e04070d912027
Author: weijie.tong <weijie.tong@alipay.com>
Date:   2017-08-20T12:07:41Z

    revert

commit 837d9fc58440fb584690f93b5f638ddcedf042a1
Author: weijie.tong <weijie.tong@alipay.com>
Date:   2017-08-22T10:35:12Z

    Merge branch 'master' of github.com:apache/drill

commit b1fc840ad9d0a9959b05a84bfd17f17067def32d
Author: weijie.tong <weijie.tong@alipay.com>
Date:   2017-08-29T16:39:48Z

    Merge branch 'master' of github.com:apache/drill

commit 03afe8650f76d182b86e2d8141780f002538f2b4
Author: weijie.tong <weijie.tong@alipay.com>
Date:   2017-08-29T16:43:21Z

    solve deadlock

----


> Foreman and Netty threads occure deadlock 
> ------------------------------------------
>
>                 Key: DRILL-5749
>                 URL: https://issues.apache.org/jira/browse/DRILL-5749
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - RPC
>    Affects Versions: 1.10.0, 1.11.0
>            Reporter: weijie.tong
>            Priority: Critical
>
> when the cluster was in high concurrency query and the reused control connection occured
exceptoin, the foreman and netty threads both try to acquire each other's lock then deadlock
occured.  The netty thread hold the map (RequestIdMap) lock then try to acquire the ReconnectingConnection
lock to send command, while the foreman thread hold the ReconnectingConnection lock then try
to acquire the RequestIdMap lock. So the deadlock happend.
> Below is the jstack dump:
> Found one Java-level deadlock:
> =============================
> "265aa5cb-e5e2-39ed-9c2f-7658b905372e:foreman":
>   waiting to lock monitor 0x00007f935b721f48 (object 0x0000000656affc40, a org.apache.drill.exec.rpc.control.ControlConnectionManager),
>   which is held by "265aa82f-d8c1-5df0-9946-003a4990db7e:foreman"
> "265aa82f-d8c1-5df0-9946-003a4990db7e:foreman":
>   waiting to lock monitor 0x00007f90de3b9648 (object 0x00000006b524d7e8, a com.carrotsearch.hppc.IntObjectHashMap),
>   which is held by "BitServer-2"
> "BitServer-2":
>   waiting to lock monitor 0x00007f935b721f48 (object 0x0000000656affc40, a org.apache.drill.exec.rpc.control.ControlConnectionManager),
>   which is held by "265aa82f-d8c1-5df0-9946-003a4990db7e:foreman"
> Java stack information for the threads listed above:
> ===================================================
> "265aa5cb-e5e2-39ed-9c2f-7658b905372e:foreman":
> 	at org.apache.drill.exec.rpc.ReconnectingConnection.runCommand(ReconnectingConnection.java:72)
> 	- waiting to lock <0x0000000656affc40> (a org.apache.drill.exec.rpc.control.ControlConnectionManager)
> 	at org.apache.drill.exec.rpc.control.ControlTunnel.sendFragments(ControlTunnel.java:66)
> 	at org.apache.drill.exec.work.foreman.Foreman.sendRemoteFragments(Foreman.java:1210)
> 	at org.apache.drill.exec.work.foreman.Foreman.setupNonRootFragments(Foreman.java:1141)
> 	at org.apache.drill.exec.work.foreman.Foreman.runPhysicalPlan(Foreman.java:454)
> 	at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:1045)
> 	at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:274)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
> 	at java.lang.Thread.run(Thread.java:849)
> "265aa82f-d8c1-5df0-9946-003a4990db7e:foreman":
> 	at org.apache.drill.exec.rpc.RequestIdMap.createNewRpcListener(RequestIdMap.java:87)
> 	- waiting to lock <0x00000006b524d7e8> (a com.carrotsearch.hppc.IntObjectHashMap)
> 	at org.apache.drill.exec.rpc.AbstractRemoteConnection.createNewRpcListener(AbstractRemoteConnection.java:153)
> 	at org.apache.drill.exec.rpc.RpcBus.send(RpcBus.java:115)
> 	at org.apache.drill.exec.rpc.RpcBus.send(RpcBus.java:89)
> 	at org.apache.drill.exec.rpc.control.ControlConnection.send(ControlConnection.java:65)
> 	at org.apache.drill.exec.rpc.control.ControlTunnel$SendFragment.doRpcCall(ControlTunnel.java:160)
> 	at org.apache.drill.exec.rpc.control.ControlTunnel$SendFragment.doRpcCall(ControlTunnel.java:150)
> 	at org.apache.drill.exec.rpc.ListeningCommand.connectionAvailable(ListeningCommand.java:38)
> 	at org.apache.drill.exec.rpc.ReconnectingConnection.runCommand(ReconnectingConnection.java:75)
> 	- locked <0x0000000656affc40> (a org.apache.drill.exec.rpc.control.ControlConnectionManager)
> 	at org.apache.drill.exec.rpc.control.ControlTunnel.sendFragments(ControlTunnel.java:66)
> 	at org.apache.drill.exec.work.foreman.Foreman.sendRemoteFragments(Foreman.java:1210)
> 	at org.apache.drill.exec.work.foreman.Foreman.setupNonRootFragments(Foreman.java:1141)
> 	at org.apache.drill.exec.work.foreman.Foreman.runPhysicalPlan(Foreman.java:454)
> 	at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:1045)
> 	at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:274)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
> 	at java.lang.Thread.run(Thread.java:849)
> "BitServer-2":
> 	at org.apache.drill.exec.rpc.ReconnectingConnection.runCommand(ReconnectingConnection.java:72)
> 	- waiting to lock <0x0000000656affc40> (a org.apache.drill.exec.rpc.control.ControlConnectionManager)
> 	at org.apache.drill.exec.rpc.control.ControlTunnel.cancelFragment(ControlTunnel.java:71)
> 	at org.apache.drill.exec.work.foreman.QueryManager.cancelExecutingFragments(QueryManager.java:220)
> 	at org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:968)
> 	at org.apache.drill.exec.work.foreman.Foreman.access$2600(Foreman.java:109)
> 	at org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:1020)
> 	at org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:1013)
> 	at org.apache.drill.common.EventProcessor.processEvents(EventProcessor.java:107)
> 	at org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:65)
> 	at org.apache.drill.exec.work.foreman.Foreman$StateSwitch.addEvent(Foreman.java:1015)
> 	at org.apache.drill.exec.work.foreman.Foreman.addToEventQueue(Foreman.java:1033)
> 	at org.apache.drill.exec.work.foreman.Foreman$FragmentSubmitListener.failed(Foreman.java:1274)
> 	at org.apache.drill.exec.rpc.ListeningCommand$DeferredRpcOutcome.failed(ListeningCommand.java:50)
> 	at org.apache.drill.exec.rpc.RequestIdMap$RpcListener.setException(RequestIdMap.java:134)
> 	at org.apache.drill.exec.rpc.RequestIdMap$SetExceptionProcedure.apply(RequestIdMap.java:74)
> 	at org.apache.drill.exec.rpc.RequestIdMap$SetExceptionProcedure.apply(RequestIdMap.java:64)
> 	at com.carrotsearch.hppc.IntObjectHashMap.forEach(IntObjectHashMap.java:692)
> 	at org.apache.drill.exec.rpc.RequestIdMap.channelClosed(RequestIdMap.java:58)
> 	- locked <0x00000006b524d7e8> (a com.carrotsearch.hppc.IntObjectHashMap)
> 	at org.apache.drill.exec.rpc.AbstractRemoteConnection.channelClosed(AbstractRemoteConnection.java:183)
> 	at org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:165)
> 	at org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:142)
> 	at org.apache.drill.exec.rpc.ReconnectingConnection$CloseHandler.operationComplete(ReconnectingConnection.java:204)
> 	at org.apache.drill.exec.rpc.ReconnectingConnection$CloseHandler.operationComplete(ReconnectingConnection.java:191)
> 	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
> 	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567)
> 	at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:406)
> 	at io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82)
> 	at io.netty.channel.AbstractChannel$CloseFuture.setClosed(AbstractChannel.java:943)
> 	at io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0(AbstractChannel.java:592)
> 	at io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:584)
> 	at io.netty.channel.DefaultChannelPipeline$HeadContext.close(DefaultChannelPipeline.java:1099)
> 	at io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:615)
> 	at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:600)
> 	at io.netty.channel.ChannelOutboundHandlerAdapter.close(ChannelOutboundHandlerAdapter.java:71)
> 	at io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:615)
> 	at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:600)
> 	at io.netty.channel.ChannelDuplexHandler.close(ChannelDuplexHandler.java:73)
> 	at io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:615)
> 	at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:600)
> 	at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:466)
> 	at org.apache.drill.exec.rpc.RpcExceptionHandler.exceptionCaught(RpcExceptionHandler.java:39)
> 	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275)
> 	at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253)
> 	at io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131)
> 	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275)
> 	at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253)
> 	at io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131)
> 	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275)
> 	at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253)
> 	at io.netty.channel.ChannelHandlerAdapter.exceptionCaught(ChannelHandlerAdapter.java:79)
> 	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message