Return-Path: X-Original-To: apmail-drill-issues-archive@minotaur.apache.org Delivered-To: apmail-drill-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7897F188F3 for ; Mon, 26 Oct 2015 17:30:28 +0000 (UTC) Received: (qmail 15214 invoked by uid 500); 26 Oct 2015 17:30:28 -0000 Delivered-To: apmail-drill-issues-archive@drill.apache.org Received: (qmail 15072 invoked by uid 500); 26 Oct 2015 17:30:28 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 14914 invoked by uid 99); 26 Oct 2015 17:30:28 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Oct 2015 17:30:28 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 3611E2C1F5C for ; Mon, 26 Oct 2015 17:30:28 +0000 (UTC) Date: Mon, 26 Oct 2015 17:30:28 +0000 (UTC) From: "Zelaine Fong (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (DRILL-3705) Query runs out of memory, reported as FAILED and leaves thread running MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/DRILL-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zelaine Fong updated DRILL-3705: -------------------------------- Assignee: Sudheesh Katkam > Query runs out of memory, reported as FAILED and leaves thread running > ----------------------------------------------------------------------- > > Key: DRILL-3705 > URL: https://issues.apache.org/jira/browse/DRILL-3705 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow > Affects Versions: 1.2.0 > Reporter: Victoria Markman > Assignee: Sudheesh Katkam > Priority: Critical > Fix For: 1.3.0 > > Attachments: 2a2451ec-09d8-9f26-e856-5fd349ae72fd.sys.drill, drillbit.log, jstack.txt > > > Single node drill installation > DRILL_MAX_DIRECT_MEMORY="2G" > DRILL_HEAP="1G" > Execute tpcds query 15 SF100 (parquet) with the settings above. Reproduces 2 out of 3 times. > {code} > SELECT ca.ca_zip, > Sum(cs.cs_sales_price) > FROM catalog_sales cs, > customer c, > customer_address ca, > date_dim dd > WHERE cs.cs_bill_customer_sk = c.c_customer_sk > AND c.c_current_addr_sk = ca.ca_address_sk > AND ( Substr(ca.ca_zip, 1, 5) IN ( '85669', '86197', '88274', '83405', > '86475', '85392', '85460', '80348', > '81792' ) > OR ca.ca_state IN ( 'CA', 'WA', 'GA' ) > OR cs.cs_sales_price > 500 ) > AND cs.cs_sold_date_sk = dd.d_date_sk > AND dd.d_qoy = 1 > AND dd.d_year = 1998 > GROUP BY ca.ca_zip > ORDER BY ca.ca_zip > LIMIT 100; > {code} > Query runs out of memory, but leaves thread behind even though it is reported as FAILED (expected result) > Snippet from jstack: > {code} > "2a2451ec-09d8-9f26-e856-5fd349ae72fd:frag:4:0" daemon prio=10 tid=0x00007f5074140000 nid=0x3000 waiting on condition [0x00007f5055b66000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000000c012b038> (a java.util.concurrent.Semaphore$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) > at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994) > at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303) > at java.util.concurrent.Semaphore.acquire(Semaphore.java:472) > at org.apache.drill.exec.ops.SendingAccountor.waitForSendComplete(SendingAccountor.java:48) > - locked <0x00000000c012b068> (a org.apache.drill.exec.ops.SendingAccountor) > at org.apache.drill.exec.ops.FragmentContext.waitForSendComplete(FragmentContext.java:436) > at org.apache.drill.exec.physical.impl.BaseRootExec.close(BaseRootExec.java:112) > at org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:341) > at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:173) > at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:292) > at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {code} > NPE in drillbit.log: > {code} > 2015-08-24 23:52:04,486 [BitServer-5] ERROR o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication. Connection: /10.10.88.133:31012 <--> /10.10.88.133:52417 (data server). Closing connection. > io.netty.handler.codec.DecoderException: java.lang.NullPointerException > at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:99) [netty-codec-4.0.27.Final.jar:4.0.27.Final] > at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at io.netty.handler.timeout.ReadTimeoutHandler.channelRead(ReadTimeoutHandler.java:150) [netty-handler-4.0.27.Final.jar:4.0.27.Final] > at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) [netty-codec-4.0.27.Final.jar:4.0.27.Final] > at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242) [netty-codec-4.0.27.Final.jar:4.0.27.Final] > at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847) [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] > at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] > at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] > at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) [netty-common-4.0.27.Final.jar:4.0.27.Final] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71] > Caused by: java.lang.NullPointerException: null > at org.apache.drill.exec.work.batch.UnlimitedRawBatchBuffer$UnlimitedBufferQueue.checkForOutOfMemory(UnlimitedRawBatchBuffer.java:68) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at org.apache.drill.exec.work.batch.BaseRawBatchBuffer.handleOutOfMemory(BaseRawBatchBuffer.java:95) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at org.apache.drill.exec.work.batch.BaseRawBatchBuffer.enqueue(BaseRawBatchBuffer.java:83) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at org.apache.drill.exec.work.batch.AbstractDataCollector.batchArrived(AbstractDataCollector.java:105) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at org.apache.drill.exec.work.batch.IncomingBuffers.batchArrived(IncomingBuffers.java:75) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at org.apache.drill.exec.work.fragment.NonRootFragmentManager.handle(NonRootFragmentManager.java:73) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at org.apache.drill.exec.rpc.data.DataResponseHandlerImpl.handle(DataResponseHandlerImpl.java:48) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at org.apache.drill.exec.rpc.data.DataServer.send(DataServer.java:176) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at org.apache.drill.exec.rpc.data.DataServer.handle(DataServer.java:142) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at org.apache.drill.exec.rpc.data.DataServer.handle(DataServer.java:51) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:233) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:205) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89) [netty-codec-4.0.27.Final.jar:4.0.27.Final] > ... 20 common frames omitted > 2015-08-24 23:52:04,489 [BitClient-1] ERROR o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication. Connection: /10.10.88.133:52417 <--> /10.10.88.133:31012 (data client). Closing connection. > java.io.IOException: syscall:read(...)() failed: Connection reset by peer > 2015-08-24 23:52:04,489 [BitClient-1] INFO o.a.drill.exec.rpc.data.DataClient - Channel closed /10.10.88.133:52417 <--> /10.10.88.133:31012. > 2015-08-24 23:52:04,505 [BitServer-6] ERROR o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication. Connection: /10.10.88.133:31012 <--> /10.10.88.133:52418 (data server). Closing connection. > io.netty.handler.codec.DecoderException: java.lang.NullPointerException > {code} > Attached: > drillbit.log > 2a2451ec-09d8-9f26-e856-5fd349ae72fd.sys.drill (query profile) > jstack.txt ( stack output for the running drillbit ) -- This message was sent by Atlassian JIRA (v6.3.4#6332)