Return-Path: X-Original-To: apmail-drill-issues-archive@minotaur.apache.org Delivered-To: apmail-drill-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1B66119BBC for ; Mon, 4 Apr 2016 17:49:26 +0000 (UTC) Received: (qmail 8770 invoked by uid 500); 4 Apr 2016 17:49:25 -0000 Delivered-To: apmail-drill-issues-archive@drill.apache.org Received: (qmail 8738 invoked by uid 500); 4 Apr 2016 17:49:25 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 8555 invoked by uid 99); 4 Apr 2016 17:49:25 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Apr 2016 17:49:25 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 8DC6C2C1F62 for ; Mon, 4 Apr 2016 17:49:25 +0000 (UTC) Date: Mon, 4 Apr 2016 17:49:25 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (DRILL-3714) Query runs out of memory and remains in CANCELLATION_REQUESTED state until drillbit is restarted MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/DRILL-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15224611#comment-15224611 ] ASF GitHub Bot commented on DRILL-3714: --------------------------------------- Github user jacques-n commented on the pull request: https://github.com/apache/drill/pull/442#issuecomment-205413400 I'm working on an alternative approach where the coordination queue is associated with each connection. Upon review, it still needs thread protection because you have contention between sending threads and the socket thread but it can be much simpler/cleaner. I'm still getting some test failures but hope to have a proposal up today. > Query runs out of memory and remains in CANCELLATION_REQUESTED state until drillbit is restarted > ------------------------------------------------------------------------------------------------ > > Key: DRILL-3714 > URL: https://issues.apache.org/jira/browse/DRILL-3714 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow > Affects Versions: 1.2.0 > Reporter: Victoria Markman > Assignee: Jacques Nadeau > Priority: Critical > Fix For: 1.7.0 > > Attachments: Screen Shot 2015-08-26 at 10.36.33 AM.png, drillbit.log, jstack.txt, query_profile_2a2210a7-7a78-c774-d54c-c863d0b77bb0.json > > > This is a variation of DRILL-3705 with the difference of drill behavior when hitting OOM condition. > Query runs out of memory during execution and remains in "CANCELLATION_REQUESTED" state until drillbit is bounced. > Client (sqlline in this case) never gets a response from the server. > Reproduction details: > Single node drillbit installation. > DRILL_MAX_DIRECT_MEMORY="8G" > DRILL_HEAP="4G" > Run this query on TPCDS SF100 data set > {code} > SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) AS TotalSpend FROM store_sales ss WHERE ss.ss_store_sk IS NOT NULL ORDER BY 1 LIMIT 10; > {code} > drillbit.log > {code} > 2015-08-26 16:54:58,469 [2a2210a7-7a78-c774-d54c-c863d0b77bb0:frag:3:22] INFO o.a.d.e.w.f.FragmentStatusReporter - 2a2210a7-7a78-c774-d54c-c863d0b77bb0:3:22: State to report: RUNNING > 2015-08-26 16:55:50,498 [BitServer-5] WARN o.a.drill.exec.rpc.data.DataServer - Message of mode REQUEST of rpc type 3 took longer than 500ms. Actual duration was 2569ms. > 2015-08-26 16:56:31,086 [BitServer-5] ERROR o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication. Connection: /10.10.88.133:31012 <--> /10.10.88.133:54554 (data server). Closing connection. > io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct buffer memory > at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:233) ~[netty-codec-4.0.27.Final.jar:4.0.27.Final] > at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847) [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] > at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] > at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] > at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) [netty-common-4.0.27.Final.jar:4.0.27.Final] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71] > Caused by: java.lang.OutOfMemoryError: Direct buffer memory > at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.7.0_71] > at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) ~[na:1.7.0_71] > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) ~[na:1.7.0_71] > at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:437) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final] > at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final] > at io.netty.buffer.PoolArena.allocate(PoolArena.java:168) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final] > at io.netty.buffer.PoolArena.reallocate(PoolArena.java:280) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final] > at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:110) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final] > at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:251) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final] > at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:849) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final] > at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:841) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final] > at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:831) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final] > at io.netty.buffer.WrappedByteBuf.writeBytes(WrappedByteBuf.java:600) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final] > at io.netty.buffer.UnsafeDirectLittleEndian.writeBytes(UnsafeDirectLittleEndian.java:28) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:4.0.27.Final] > at io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:92) ~[netty-codec-4.0.27.Final.jar:4.0.27.Final] > at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:227) ~[netty-codec-4.0.27.Final.jar:4.0.27.Final] > ... 11 common frames omitted > 2015-08-26 16:56:31,087 [BitServer-5] INFO o.a.d.exec.rpc.ProtobufLengthDecoder - Channel is closed, discarding remaining 124958 byte(s) in buffer. > 2015-08-26 16:56:31,087 [BitClient-1] ERROR o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication. Connection: /10.10.88.133:54554 <--> /10.10.88.133:31012 (data client). Closing connection. > java.io.IOException: syscall:read(...)() failed: Connection reset by peer > 2015-08-26 16:56:31,088 [BitClient-1] INFO o.a.drill.exec.rpc.data.DataClient - Channel closed /10.10.88.133:54554 <--> /10.10.88.133:31012. > 2015-08-26 16:56:35,325 [BitServer-6] ERROR o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication. Connection: /10.10.88.133:31012 <--> /10.10.88.133:54555 (data server). Closing connection. > io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct buffer memory > at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:233) ~[netty-codec-4.0.27.Final.jar:4.0.27.Final] > at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847) [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] > at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] > at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] > at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) [netty-common-4.0.27.Final.jar:4.0.27.Final] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71] > {code} > Attached: > query_profile_2a2210a7-7a78-c774-d54c-c863d0b77bb0.json > drillbit.log > jstack.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)