drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Deneche A. Hakim (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-3714) Query runs out of memory and remains in CANCELLATION_REQUESTED state until drillbit is restarted
Date Tue, 29 Mar 2016 06:12:25 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15215507#comment-15215507
] 

Deneche A. Hakim commented on DRILL-3714:
-----------------------------------------

[~jacques@dremio.com] assuming this is indeed the test case requested, here is my understanding:

The fix proposed for this JIRA is helpful for any message waiting, in the CoordinationQueue,
for an ACK. In the case of the UserClient, it only really waits for the handshake and the
queryId for every submitted query, other than that he CoordinationQueue should be empty most
of the time.
The fix I proposed here may help the UserClient fail quickly if it's waiting for a queryId,
but other than that it will still hang for any query it didn't receive the terminal state
yet (DRILL-3743). It's a problem worth fixing, but I think it should be done separately.

> Query runs out of memory and remains in CANCELLATION_REQUESTED state until drillbit is
restarted
> ------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-3714
>                 URL: https://issues.apache.org/jira/browse/DRILL-3714
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.2.0
>            Reporter: Victoria Markman
>            Assignee: Jacques Nadeau
>            Priority: Critical
>             Fix For: 1.7.0
>
>         Attachments: Screen Shot 2015-08-26 at 10.36.33 AM.png, drillbit.log, jstack.txt,
query_profile_2a2210a7-7a78-c774-d54c-c863d0b77bb0.json
>
>
> This is a variation of DRILL-3705 with the difference of drill behavior when hitting
OOM condition.
> Query runs out of memory during execution and remains in "CANCELLATION_REQUESTED" state
until drillbit is bounced.
> Client (sqlline in this case) never gets a response from the server.
> Reproduction details:
>         Single node drillbit installation.
>         DRILL_MAX_DIRECT_MEMORY="8G"
>         DRILL_HEAP="4G"
> Run this query on TPCDS SF100 data set
> {code}
> SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) AS TotalSpend FROM
store_sales ss WHERE ss.ss_store_sk IS NOT NULL ORDER BY 1 LIMIT 10;
> {code}
> drillbit.log
> {code}
> 2015-08-26 16:54:58,469 [2a2210a7-7a78-c774-d54c-c863d0b77bb0:frag:3:22] INFO  o.a.d.e.w.f.FragmentStatusReporter
- 2a2210a7-7a78-c774-d54c-c863d0b77bb0:3:22: State to report: RUNNING
> 2015-08-26 16:55:50,498 [BitServer-5] WARN  o.a.drill.exec.rpc.data.DataServer - Message
of mode REQUEST of rpc type 3 took longer than 500ms.  Actual duration was 2569ms.
> 2015-08-26 16:56:31,086 [BitServer-5] ERROR o.a.d.exec.rpc.RpcExceptionHandler - Exception
in RPC communication.  Connection: /10.10.88.133:31012 <--> /10.10.88.133:54554 (data
server).  Closing connection.
> io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct buffer memory
>         at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:233)
~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618)
[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
>         at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329)
[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
>         at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
>         at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
[netty-common-4.0.27.Final.jar:4.0.27.Final]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> Caused by: java.lang.OutOfMemoryError: Direct buffer memory
>         at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.7.0_71]
>         at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) ~[na:1.7.0_71]
>         at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) ~[na:1.7.0_71]
>         at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:437) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.buffer.PoolArena.allocate(PoolArena.java:168) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.buffer.PoolArena.reallocate(PoolArena.java:280) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:110) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:251) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:849) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:841) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:831) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.buffer.WrappedByteBuf.writeBytes(WrappedByteBuf.java:600) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.buffer.UnsafeDirectLittleEndian.writeBytes(UnsafeDirectLittleEndian.java:28)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:4.0.27.Final]
>         at io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:92)
~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:227)
~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
>         ... 11 common frames omitted
> 2015-08-26 16:56:31,087 [BitServer-5] INFO  o.a.d.exec.rpc.ProtobufLengthDecoder - Channel
is closed, discarding remaining 124958 byte(s) in buffer.
> 2015-08-26 16:56:31,087 [BitClient-1] ERROR o.a.d.exec.rpc.RpcExceptionHandler - Exception
in RPC communication.  Connection: /10.10.88.133:54554 <--> /10.10.88.133:31012 (data
client).  Closing connection.
> java.io.IOException: syscall:read(...)() failed: Connection reset by peer
> 2015-08-26 16:56:31,088 [BitClient-1] INFO  o.a.drill.exec.rpc.data.DataClient - Channel
closed /10.10.88.133:54554 <--> /10.10.88.133:31012.
> 2015-08-26 16:56:35,325 [BitServer-6] ERROR o.a.d.exec.rpc.RpcExceptionHandler - Exception
in RPC communication.  Connection: /10.10.88.133:31012 <--> /10.10.88.133:54555 (data
server).  Closing connection.
> io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct buffer memory
>         at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:233)
~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618)
[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
>         at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329)
[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
>         at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
>         at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
[netty-common-4.0.27.Final.jar:4.0.27.Final]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> {code}
> Attached:
>         query_profile_2a2210a7-7a78-c774-d54c-c863d0b77bb0.json
>         drillbit.log
>         jstack.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message