drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Victoria Markman (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-3265) Query with window function and sort below that spills to disk runs out of memory
Date Tue, 09 Jun 2015 17:48:00 GMT

     [ https://issues.apache.org/jira/browse/DRILL-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Victoria Markman updated DRILL-3265:
------------------------------------
    Attachment: drill-3265.log

drillbit.log with the additional information.

> Query with window function and sort below that spills to disk runs out of memory
> --------------------------------------------------------------------------------
>
>                 Key: DRILL-3265
>                 URL: https://issues.apache.org/jira/browse/DRILL-3265
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 1.0.0
>            Reporter: Victoria Markman
>            Assignee: Chris Westin
>         Attachments: drill-3265.log
>
>
> Failure:
> {code}
> 0: jdbc:drill:schema=dfs> select
> . . . . . . . . . . . . >   sum(ss_quantity) over(partition by ss_store_sk order by
ss_sold_date_sk)
> . . . . . . . . . . . . > from store_sales;
> java.lang.RuntimeException: java.sql.SQLException: RESOURCE ERROR: One or more nodes
ran out of memory while executing the query.
> Fragment 1:13
> [Error Id: 72609220-e431-41fc-8505-8f9740c96153 on atsqa4-133.qa.lab:31010]
>         at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
>         at sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:85)
>         at sqlline.TableOutputFormat.print(TableOutputFormat.java:116)
>         at sqlline.SqlLine.print(SqlLine.java:1583)
>         at sqlline.Commands.execute(Commands.java:852)
>         at sqlline.Commands.sql(Commands.java:751)
>         at sqlline.SqlLine.dispatch(SqlLine.java:738)
>         at sqlline.SqlLine.begin(SqlLine.java:612)
>         at sqlline.SqlLine.start(SqlLine.java:366)
>         at sqlline.SqlLine.main(SqlLine.java:259)
> {code}
> Failure looks legitimate from customer point of view (granted that we know that we can
run out of memory in some cases)
> However, there are couple of problems with this scenario:
> 1. It looks like we are running out of memory during disk based sort
> {code}
> 2015-06-09 16:55:45,693 [2a88e633-b714-49a1-c36a-509a9c817c77:frag:1:19] WARN  o.a.d.e.p.i.xsort.ExternalSortBatch
- Starting to merge. 3311 batch groups. Current allocated memory: 17456644
> 2015-06-09 16:55:45,763 [2a88e633-b714-49a1-c36a-509a9c817c77:frag:1:13] WARN  o.a.d.exec.memory.BufferAllocator
- Unable to allocate buffer of size 20833 due to memory limit. Current allocation: 19989451
> java.lang.Exception: null
>         at org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.buffer(TopLevelAllocator.java:253)
[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
>         at org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.buffer(TopLevelAllocator.java:273)
[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
>         at org.apache.drill.exec.vector.UInt1Vector.allocateNew(UInt1Vector.java:171)
[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
>         at org.apache.drill.exec.vector.NullableIntVector.allocateNew(NullableIntVector.java:204)
[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
>         at org.apache.drill.exec.vector.AllocationHelper.allocateNew(AllocationHelper.java:56)
[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
>         at org.apache.drill.exec.test.generated.PriorityQueueCopierGen139.allocateVectors(PriorityQueueCopierTemplate.java:123)
[na:na]
>         at org.apache.drill.exec.test.generated.PriorityQueueCopierGen139.next(PriorityQueueCopierTemplate.java:66)
[na:na]
>         at org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:224)
[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:146)
[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:105)
[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:95)
[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:92)
[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:146)
[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:105)
[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:95)
[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.window.WindowFrameRecordBatch.innerNext(WindowFrameRecordBatch.java:123)
[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:146)
[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:105)
[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:95)
[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129)
[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:146)
[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:83)
[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:95)
[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:73)
[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:259)
[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:253)
[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
>         at java.security.AccessController.doPrivileged(Native Method) [na:1.7.0_71]
>         at javax.security.auth.Subject.doAs(Subject.java:415) [na:1.7.0_71]
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
[hadoop-common-2.5.1-mapr-1503.jar:na]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:253)
[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
>         at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
[drill-common-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[na:1.7.0_71]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_71]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> 2015-06-09 16:55:45,765 [2a88e633-b714-49a1-c36a-509a9c817c77:frag:1:13] INFO  o.a.d.c.exceptions.UserException
- User Error Occurred
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more nodes ran
out of memory while executing the query.
> {code}
> 2. The same sort running by itself and spilling to disk finishes successfully.
> 3. There is an indication in drillbit.log that memory leak has occurred.
> {code}
> 2015-06-09 16:55:46,053 [2a88e633-b714-49a1-c36a-509a9c817c77:frag:1:11] ERROR o.a.d.c.exceptions.UserException
- SYSTEM ERROR: java.lang.IllegalStateException: Failure while closing accountor.  Expected
private and shared pools to be set to initial values.  However, one or more were not.  Stats
are
>         zone    init    allocated       delta
>         private 10000000        10636   9989364
>         shared  10000000        3114634 6885366.
> Fragment 1:11
> [Error Id: 607d4f33-dda8-4cd9-8489-e34e914737e9 on atsqa4-133.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: java.lang.IllegalStateException:
Failure while closing accountor.  Expected private and shared pools to be set to initial values.
 However, one or more were not.  Stats are
>         zone    init    allocated       delta
>         private 10000000        10636   9989364
>         shared  10000000        3114634 6885366.
> Fragment 1:11
> {code}   
> 3. Ocasionally, after executing the same query after it failed, I see an error on failing
to created a spill directory even though I have enough disk space in my spill directory that
is configured to be on a local file system.
> (remember, sort by itself runs fine)
> {code}
> 0: jdbc:drill:schema=dfs> select 
> . . . . . . . . . . . . >   sum(ss_quantity) over(partition by ss_store_sk order by
ss_sold_date_sk) 
> . . . . . . . . . . . . > from store_sales;
> java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: java.io.IOException:
Mkdirs failed to create /tmp/drill/spill/2a88e74d-df47-146d-0cf9-8773303a7f37/major_fragment_1/minor_fragment_0/operator_4
(exists=false, cwd=file:///opt/mapr/drill/drill-1.1.0/bin)
> {code}
> Additional information:
> Plan for the query that runs out of memory:
> {code}
> 0: jdbc:drill:schema=dfs> explain plan for select
> . . . . . . . . . . . . >   sum(ss_quantity) over(partition by ss_store_sk order by
ss_sold_date_sk)
> . . . . . . . . . . . . > from store_sales;
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      UnionExchange
> 01-01        Project(EXPR$0=[CASE(>($3, 0), CAST($4):ANY, null)])
> 01-02          Window(window#0=[window(partition {1} order by [2] range between UNBOUNDED
PRECEDING and CURRENT ROW aggs [COUNT($0), $SUM0($0)])])01-03            SelectionVectorRemover
> 01-04              Sort(sort0=[$1], sort1=[$2], dir0=[ASC], dir1=[ASC])
> 01-05                Project(ss_quantity=[$0], ss_store_sk=[$1], ss_sold_date_sk=[$2])
> 01-06                  HashToRandomExchange(dist0=[[$1]])
> 02-01                    UnorderedMuxExchange
> 03-01                      Project(ss_quantity=[$0], ss_store_sk=[$1], ss_sold_date_sk=[$2],
E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($1))])
> 03-02                        Project(ss_quantity=[$2], ss_store_sk=[$1], ss_sold_date_sk=[$0])
> 03-03                          Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=maprfs:///tpcds100/parquet/store_sales]], selectionRoot=/tpcds100/parquet/store_sales,
numFiles=1, columns=[`ss_qua
> ntity`, `ss_store_sk`, `ss_sold_date_sk`]]])
> {code}
> The same sort executes succesfully when window function is not present:
> {code}
> 0: jdbc:drill:schema=dfs> explain plan for select ss_quantity, ss_store_sk, ss_sold_date_sk
from store_sales order by ss_store_sk, ss_sold_date_sk;
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(ss_quantity=[$0], ss_store_sk=[$1], ss_sold_date_sk=[$2])
> 00-02        SingleMergeExchange(sort0=[1 ASC], sort1=[2 ASC])
> 01-01          SelectionVectorRemover
> 01-02            Sort(sort0=[$1], sort1=[$2], dir0=[ASC], dir1=[ASC])
> 01-03              Project(ss_quantity=[$0], ss_store_sk=[$1], ss_sold_date_sk=[$2])
> 01-04                HashToRandomExchange(dist0=[[$1]], dist1=[[$2]])
> 02-01                  UnorderedMuxExchange
> 03-01                    Project(ss_quantity=[$0], ss_store_sk=[$1], ss_sold_date_sk=[$2],
E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($2, hash64AsDouble($1)))])
> 03-02                      Project(ss_quantity=[$2], ss_store_sk=[$1], ss_sold_date_sk=[$0])
> 03-03                        Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=maprfs:///tpcds100/parquet/store_sales]], selectionRoot=/tpcds100/parquet/store_sales,
numFiles=1, columns=[`ss_quantity`, `ss_store_sk`, `ss_sold_date_sk`]]])
> {code}
> To reproduce:
> 1. Single drillbit
> 2. Configuration:
>         DRILL_MAX_DIRECT_MEMORY="8G"
>         DRILL_HEAP="4G" 
> 3. Dataset: tpcds100 parquet
> 4. Run query:
> {code}
> select 
>   sum(ss_quantity) over(partition by ss_store_sk order by ss_sold_date_sk) 
> from store_sales;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message