drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5522) OOM during the merge and spill process of the managed external sort
Date Sun, 18 Jun 2017 18:57:02 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053293#comment-16053293
] 

Paul Rogers commented on DRILL-5522:
------------------------------------

Revised the "record batch sizer" to consider actual allocated memory. This handles both the
case of a batch build from a scanner, and a batch deserialized from an exchange.

The record batch sizer expected vectors to be of the form that a reader produces in which
each vector “part” (offsets, data, bits) is backed by a separately-allocated buffer. In
this structure, knowing the size of each buffer is sufficient to learn the size of all memory
for a record batch. Has worked fine in many tests.

But, once exchanges enter the picture, all the assumptions are invalid. Rather than each vector
having its own buffer, the vectors instead share a single big vector; with each buffer using
a “slice” of that big vector. Add up the slices and the total is far less that the big
vector (up to 50% less.)

The difference is due to the Netty memory allocator which requires power-of-two rounding in
buffer sizes. In this particular test, the amount of memory used by the vector "payload" is
only half of the total deserialized buffer size, leading to an OOM error in the sort when
we use the data size to decide whether to spill, but then actually incur the full cost of
the backing vector. The result is that the sort's memory allocator no longer has space available
for the required SV2 and we get an OOM.

The sizer now considers the underlying ledgers to get the true memory allocation. Long term,
we are doing far to much work to work around the original design. A better long-term design
would be:

* Each record batch is aware of the amount of memory used to store that batch.
* Operators do not have their own allocators that enforce limits. Instead, allocation is done
at the fragment level.

> OOM during the merge and spill process of the managed external sort
> -------------------------------------------------------------------
>
>                 Key: DRILL-5522
>                 URL: https://issues.apache.org/jira/browse/DRILL-5522
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 1.10.0
>            Reporter: Rahul Challapalli
>            Assignee: Paul Rogers
>         Attachments: 26e334aa-1afa-753f-3afe-862f76b80c18.sys.drill, drillbit.log, drillbit.out,
drill-env.sh
>
>
> git.commit.id.abbrev=1e0a14c
> The below query fails with an OOM
> {code}
> ALTER SESSION SET `exec.sort.disable_managed` = false;
> alter session set `planner.memory.max_query_memory_per_node` = 1552428800;
> create table dfs.drillTestDir.xsort_ctas3_multiple partition by (type, aCol) as select
type, rptds, rms, s3.rms.a aCol, uid from (
>   select * from (
>     select s1.type type, flatten(s1.rms.rptd) rptds, s1.rms, s1.uid
>     from (
>       select d.type type, d.uid uid, flatten(d.map.rm) rms from dfs.`/drill/testdata/resource-manager/nested-large.json`
d order by d.uid
>     ) s1
>   ) s2
>   order by s2.rms.mapid, s2.rptds.a
> ) s3;
> {code}
> Stack trace
> {code}
> 2017-05-17 15:15:35,027 [26e334aa-1afa-753f-3afe-862f76b80c18:frag:4:2] INFO  o.a.d.e.w.fragment.FragmentExecutor
- User Error Occurred: One or more nodes ran out of memory while executing the query. (Unable
to allocate buffer of size 2097152 due to memory limit. Current allocation: 29229064)
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more nodes ran
out of memory while executing the query.
> Unable to allocate buffer of size 2097152 due to memory limit. Current allocation: 29229064
> [Error Id: 619e2e34-704c-4964-a354-1348fb33ce8a ]
>         at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:244)
[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[na:1.7.0_111]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_111]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111]
> Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to allocate buffer
of size 2097152 due to memory limit. Current allocation: 29229064
>         at org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:220)
~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:195)
~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.vector.BigIntVector.reAlloc(BigIntVector.java:212) ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.vector.BigIntVector.copyFromSafe(BigIntVector.java:324)
~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.vector.NullableBigIntVector.copyFromSafe(NullableBigIntVector.java:367)
~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.vector.NullableBigIntVector$TransferImpl.copyValueSafe(NullableBigIntVector.java:328)
~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.vector.complex.RepeatedMapVector$RepeatedMapTransferPair.copyValueSafe(RepeatedMapVector.java:360)
~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.vector.complex.MapVector$MapTransferPair.copyValueSafe(MapVector.java:220)
~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.vector.complex.MapVector.copyFromSafe(MapVector.java:82)
~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.test.generated.PriorityQueueCopierGen49.doCopy(PriorityQueueCopierTemplate.java:34)
~[na:na]
>         at org.apache.drill.exec.test.generated.PriorityQueueCopierGen49.next(PriorityQueueCopierTemplate.java:76)
~[na:na]
>         at org.apache.drill.exec.physical.impl.xsort.managed.CopierHolder$BatchMerger.next(CopierHolder.java:234)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.mergeSpilledRuns(ExternalSortBatch.java:1214)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load(ExternalSortBatch.java:689)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext(ExternalSortBatch.java:559)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:92)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:234)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:227)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at java.security.AccessController.doPrivileged(Native Method) ~[na:1.7.0_111]
>         at javax.security.auth.Subject.doAs(Subject.java:415) ~[na:1.7.0_111]
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
~[hadoop-common-2.7.0-mapr-1607.jar:na]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:227)
[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         ... 4 common frames omitted
> {code}
> I attached the logs and the profile files. The data set is too large to attach here



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message