drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Khurram Faraaz (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-3206) Memory leak in window functions
Date Thu, 28 May 2015 20:56:25 GMT
Khurram Faraaz created DRILL-3206:
-------------------------------------

             Summary: Memory leak in window functions
                 Key: DRILL-3206
                 URL: https://issues.apache.org/jira/browse/DRILL-3206
             Project: Apache Drill
          Issue Type: Bug
          Components: Execution - Flow
    Affects Versions: 1.0.0
         Environment: 21cc578b6b8c8f3ca1ebffd3dbb92e35d68bc726
            Reporter: Khurram Faraaz
            Assignee: Chris Westin


Test was run on 4 node cluster on CentOS.

Size in bytes of JSON data file.

{code}
[root@centos-01 ~]# hadoop fs -ls /tmp/twoKeyJsn.json
-rwxr-xr-x   3 root root  888409136 2015-04-20 18:32 /tmp/twoKeyJsn.json
{code}

{code}
0: jdbc:drill:schema=dfs.tmp> select count(key1) over(partition by key2 order by key1)
from `twoKeyJsn.json`;
java.lang.RuntimeException: java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out
of memory while executing the query.

Fragment 1:7

[Error Id: 8ffc94b9-1318-4841-9247-259155e97202 on centos-02.qa.lab:31010]
	at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
	at sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:85)
	at sqlline.TableOutputFormat.print(TableOutputFormat.java:116)
	at sqlline.SqlLine.print(SqlLine.java:1583)
	at sqlline.Commands.execute(Commands.java:852)
	at sqlline.Commands.sql(Commands.java:751)
	at sqlline.SqlLine.dispatch(SqlLine.java:738)
	at sqlline.SqlLine.begin(SqlLine.java:612)
	at sqlline.SqlLine.start(SqlLine.java:366)
	at sqlline.SqlLine.main(SqlLine.java:259)
{code}

Memory usage after above query was executed
{code}
0: jdbc:drill:schema=dfs.tmp> select * from sys.memory;
+-------------------+------------+---------------+-------------+-----------------+---------------------+-------------+
|     hostname      | user_port  | heap_current  |  heap_max   | direct_current  | jvm_direct_current
 | direct_max  |
+-------------------+------------+---------------+-------------+-----------------+---------------------+-------------+
| centos-01.qa.lab  | 31010      | 1304067160    | 4294967296  | 110019091       | 520095827
          | 8589934592  |
| centos-03.qa.lab  | 31010      | 2020130800    | 4294967296  | 301360965       | 738199649
          | 8589934592  |
| centos-02.qa.lab  | 31010      | 1253034864    | 4294967296  | 156397935       | 553649232
          | 8589934592  |
| centos-04.qa.lab  | 31010      | 385872528     | 4294967296  | 203721765       | 553649246
          | 8589934592  |
+-------------------+------------+---------------+-------------+-----------------+---------------------+-------------+
4 rows selected (0.134 seconds)
{code}

Memory details after rerunning the query, we are leaking memory.

{code}
0: jdbc:drill:schema=dfs.tmp> select count(key1) over(partition by key2 order by key1)
from `twoKeyJsn.json`;
java.lang.RuntimeException: java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out
of memory while executing the query.

Fragment 1:7

[Error Id: fe56b1ff-02b6-4ded-a317-d753ab211f5b on centos-03.qa.lab:31010]
	at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
	at sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:85)
	at sqlline.TableOutputFormat.print(TableOutputFormat.java:116)
	at sqlline.SqlLine.print(SqlLine.java:1583)
	at sqlline.Commands.execute(Commands.java:852)
	at sqlline.Commands.sql(Commands.java:751)
	at sqlline.SqlLine.dispatch(SqlLine.java:738)
	at sqlline.SqlLine.begin(SqlLine.java:612)
	at sqlline.SqlLine.start(SqlLine.java:366)
	at sqlline.SqlLine.main(SqlLine.java:259)
0: jdbc:drill:schema=dfs.tmp> select * from sys.memory;
+-------------------+------------+---------------+-------------+-----------------+---------------------+-------------+
|     hostname      | user_port  | heap_current  |  heap_max   | direct_current  | jvm_direct_current
 | direct_max  |
+-------------------+------------+---------------+-------------+-----------------+---------------------+-------------+
| centos-01.qa.lab  | 31010      | 2414546008    | 4294967296  | 438149911       | 905971795
          | 8589934592  |
| centos-02.qa.lab  | 31010      | 1953483632    | 4294967296  | 901110416       | 1442841680
         | 8589934592  |
| centos-03.qa.lab  | 31010      | 297329544     | 4294967296  | 560852951       | 1308624993
         | 8589934592  |
| centos-04.qa.lab  | 31010      | 458157528     | 4294967296  | 740156752       | 1207960670
         | 8589934592  |
+-------------------+------------+---------------+-------------+-----------------+---------------------+-------------+
4 rows selected (0.118 seconds)
{code}

there are 16 distinct partitions (PARTITION BY key2)

{code}
0: jdbc:drill:schema=dfs.tmp> select distinct key2 from `twoKeyJsn.json`;
+-------+
| key2  |
+-------+
| d     |
| c     |
| b     |
| 1     |
| a     |
| 0     |
| k     |
| m     |
| j     |
| h     |
| e     |
| n     |
| g     |
| f     |
| l     |
| i     |
+-------+
16 rows selected (28.967 seconds)

{code}

Details from drillbit.log

{code}
error_type: SYSTEM
    message: "SYSTEM ERROR: java.lang.IllegalStateException: Failure while closing accountor.
 Expected private and shared pools to be set to initial values.  However, one or more were
not.  Stats are\n\tzone\tinit\tallocated\tdelta \n\tprivate\t1000000\t0\t1000000 \n\tshared\t9999000000\t9928320966\t70679034.\n\nFragment
1:8\n\n[Error Id: b7b41c03-1122-4fa4-b441-9aa10544a91e on centos-02.qa.lab:31010]"
    exception {
      exception_class: "java.lang.IllegalStateException"
      message: "Failure while closing accountor.  Expected private and shared pools to be
set to initial values.  However, one or more were not.  Stats are\n\tzone\tinit\tallocated\tdelta
\n\tprivate\t1000000\t0\t1000000 \n\tshared\t9999000000\t9928320966\t70679034."
      stack_trace {
        class_name: "org.apache.drill.exec.memory.AtomicRemainder"
        file_name: "AtomicRemainder.java"
        line_number: 200
        method_name: "close"
        is_native_method: false
      }
{code}

Stack trace

{code}
org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR: One or more nodes
ran out of memory while executing the query.

Fragment 1:7

[Error Id: fe56b1ff-02b6-4ded-a317-d753ab211f5b on centos-03.qa.lab:31010]
        at org.apache.drill.exec.work.foreman.QueryManager$1.statusUpdate(QueryManager.java:458)
[drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
        at org.apache.drill.exec.rpc.control.WorkEventBus.statusUpdate(WorkEventBus.java:71)
[drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
        at org.apache.drill.exec.work.batch.ControlMessageHandler.handle(ControlMessageHandler.java:79)
[drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
        at org.apache.drill.exec.rpc.control.ControlServer.handle(ControlServer.java:61) [drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
        at org.apache.drill.exec.rpc.control.ControlServer.handle(ControlServer.java:38) [drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
        at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:61) [drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
        at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:233) [drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
        at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:205) [drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
        at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
[netty-codec-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.handler.timeout.ReadTimeoutHandler.channelRead(ReadTimeoutHandler.java:150)
[netty-handler-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
[netty-codec-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
[netty-codec-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618)
[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
        at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
[netty-common-4.0.27.Final.jar:4.0.27.Final]
        at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message