drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Khurram Faraaz (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5499) TPC-DS Query 4 concurrent execution RESOURCE ERROR: One or more nodes ran out of memory while executing the query.
Date Fri, 19 May 2017 17:01:04 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16017661#comment-16017661
] 

Khurram Faraaz commented on DRILL-5499:
---------------------------------------

I haven't tried this on 1.10.0, this was a new test, I will verify if this is also seen on
Drill 1.10.0.
We hit this after we discussed about testing with a query that used subsequently large amount
of main memory and then running that same query concurrently, to observe memory usage behavior,
as one of the failure injection tests. Aim of test was to reduce the amount total main memory
available to the Drillbit while the concurrent queries were being executed. 
Even before reducing the total available main memory to the Drillbit, we saw the RESOURCE
ERROR message.

> TPC-DS Query 4 concurrent execution RESOURCE ERROR: One or more nodes ran out of memory
while executing the query.
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-5499
>                 URL: https://issues.apache.org/jira/browse/DRILL-5499
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.11.0
>         Environment: single node CentOS
>            Reporter: Khurram Faraaz
>
> Running TPC-DS query 4 sequentially does not result in error/OOM. TPC-DS query 4 takes
7 GB memory on an average, on Drill 1.11.0.
> However, there is an OOM and RESOURCE ERROR when we run TPC-DS query 4 concurrently using
8 threads on a single node that has 24 processing units available.
> {noformat}
> 2017-05-09 13:42:30,547 [BitServer-5] ERROR o.a.d.exec.rpc.RpcExceptionHandler - Exception
in RPC communication.  Connection: /10.10.100.201:31012 <--> /10.10.100.201:59746 (data
server).  Closing connection.
> io.netty.handler.codec.DecoderException: org.apache.drill.exec.exception.OutOfMemoryException:
Unable to allocate buffer of size 25165824 due to memory limit. Current allocation: 21037056
>         at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:233)
~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
[netty-common-4.0.27.Final.jar:4.0.27.Final]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to allocate buffer
of size 25165824 due to memory limit. Current allocation: 21037056
>         at org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:220)
~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:195)
~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at io.netty.buffer.ExpandableByteBuf.capacity(ExpandableByteBuf.java:43) ~[drill-memory-base-1.11.0-SNAPSHOT.jar:4.0.27.Final]
>         at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:251) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:849) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:841) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:831) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:92)
~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:227)
~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
>         ... 13 common frames omitted
> ...
> 2017-05-09 13:42:30,547 [26ee38c2-75c8-05cd-0cc0-9263fefba955:frag:34:0] INFO  o.a.d.e.w.fragment.FragmentExecutor
- 26ee38c2-75c8-05cd-0cc0-9263fefba955:34:0: State change requested AWAITING_ALLOCATION -->
RUNNING
> 2017-05-09 13:42:30,547 [26ee38c2-75c8-05cd-0cc0-9263fefba955:frag:34:0] INFO  o.a.d.e.w.f.FragmentStatusReporter
- 26ee38c2-75c8-05cd-0cc0-9263fefba955:34:0: State to report: RUNNING
> 2017-05-09 13:42:30,547 [BitServer-5] INFO  o.a.d.exec.rpc.ProtobufLengthDecoder - Channel
is closed, discarding remaining 3022732 byte(s) in buffer.
> 2017-05-09 13:42:30,548 [BitClient-1] WARN  o.a.d.exec.rpc.RpcExceptionHandler - Exception
occurred with closed channel.  Connection: /10.10.100.201:59746 <--> centos-01.qa.lab/10.10.100.201:31012
(data client)
> java.io.IOException: Connection reset by peer
>         at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[na:1.8.0_91]
>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[na:1.8.0_91]
>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[na:1.8.0_91]
>         at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[na:1.8.0_91]
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) ~[na:1.8.0_91]
>         at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311)
~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.buffer.WrappedByteBuf.setBytes(WrappedByteBuf.java:407) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.buffer.UnsafeDirectLittleEndian.setBytes(UnsafeDirectLittleEndian.java:30)
~[drill-memory-base-1.11.0-SNAPSHOT.jar:4.0.27.Final]
>         at io.netty.buffer.DrillBuf.setBytes(DrillBuf.java:770) ~[drill-memory-base-1.11.0-SNAPSHOT.jar:4.0.27.Final]
>         at io.netty.buffer.MutableWrappedByteBuf.setBytes(MutableWrappedByteBuf.java:280)
~[drill-memory-base-1.11.0-SNAPSHOT.jar:4.0.27.Final]
>         at io.netty.buffer.ExpandableByteBuf.setBytes(ExpandableByteBuf.java:26) ~[drill-memory-base-1.11.0-SNAPSHOT.jar:4.0.27.Final]
>         at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:241)
~[netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
~[netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
[netty-common-4.0.27.Final.jar:4.0.27.Final]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> 2017-05-09 13:42:30,548 [26ee38c2-7f3c-b492-5d47-97553d90ab1f:frag:72:0] INFO  o.a.d.e.w.fragment.FragmentExecutor
- User Error Occurred: One or more nodes ran out of memory while executing the query. (Unable
to allocate buffer of size 262144 due to memory limit. Current allocation: 3690496)
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more nodes ran
out of memory while executing the query.
> Unable to allocate buffer of size 262144 due to memory limit. Current allocation: 3690496
> [Error Id: 181248c2-bf8e-401e-9936-9b4cb7c2d213 ]
>         at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:244)
[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_91]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_91]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to allocate buffer
of size 262144 due to memory limit. Current allocation: 3690496
>         at org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:220)
~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:195)
~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.vector.UInt4Vector.reAlloc(UInt4Vector.java:212) ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.vector.UInt4Vector$Mutator.setSafe(UInt4Vector.java:409)
~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.vector.VarCharVector$Mutator.setSafe(VarCharVector.java:605)
~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.vector.NullableVarCharVector$Mutator.setSafe(NullableVarCharVector.java:560)
~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.test.generated.ProjectorGen12364.doEval(ProjectorTemplate.java:73)
~[na:na]
>         at org.apache.drill.exec.test.generated.ProjectorGen12364.projectRecords(ProjectorTemplate.java:67)
~[na:na]
>         at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork(ProjectRecordBatch.java:199)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:93)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:135)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.broadcastsender.BroadcastSenderRootExec.innerNext(BroadcastSenderRootExec.java:95)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:234)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:227)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at java.security.AccessController.doPrivileged(Native Method) ~[na:1.8.0_91]
>         at javax.security.auth.Subject.doAs(Subject.java:422) ~[na:1.8.0_91]
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
~[hadoop-common-2.7.0-mapr-1607.jar:na]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:227)
[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         ... 4 common frames omitted
> 2017-05-09 13:42:30,548 [26ee38c2-7f3c-b492-5d47-97553d90ab1f:frag:72:0] INFO  o.a.d.e.w.fragment.FragmentExecutor
- 26ee38c2-7f3c-b492-5d47-97553d90ab1f:72:0: State change requested RUNNING --> FAILED
> {noformat}
> nproc --all; returned 24 on centos-01 node
> memory utilization (increase) while the concurrent program was under execution (i.e it
was executing TPC-DS query 4 from many Threads)
> [root@centos-01 ~]# ps_mem --total -p 11371
> 3068859904
> [root@centos-01 ~]# ps_mem --total -p 11371
> 5967798784
> [root@centos-01 ~]# ps_mem --total -p 11371
> 7694078464
> [root@centos-01 ~]# ps_mem --total -p 11371
> 10252224512
> [root@centos-01 ~]# ps_mem --total -p 11371
> 10254476288
> The concurrent Java program used in test is available here, this is reproducible
> https://github.com/kfaraaz/tests/blob/master/src/ConcurrentQuery.java
> ps_mem tool is available here - https://github.com/pixelb/ps_mem
> ps_mem : A utility to accurately report the core memory usage for a program
>     



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message