drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From François Méthot <fmetho...@gmail.com>
Subject Re: Aggregation OutOfMemoryException
Date Wed, 16 Mar 2016 13:47:57 GMT
The default spill directory (/tmp) did not have enough space. We fixed
that. (thanks John)

I altered session to set
planner.memory.max_query_memory_per_node = 17179869184 (16GB)
planner.enable_hashjoin=false;
planner.enable_hashadd=false;

We ran our aggregation.

After 7h44m.

We got

Error: RESOURCE ERROR: External Sort encountered an error while spilling to
disk

Fragment 7:35

Caused by org.apache.drill.exec.exception.OutOfMemory: Unable to allocate
buffer of size 65536 (rounded from 37444) due to memory limit. Current
allocation: 681080448.

org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:216)

org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:191)

org.apache.drill.exec.cache.VectorAccessibleSerializable.readFromStream(VectorAccessibleSerializable.java:112)

org.apache.drill.exec.physical.impl.xsort.BatchGroup.getBatch(BatchGroup.java:110)

org.apache.drill.exec.physical.impl.xsort.BatchGroup.getNextIndex(BatchGroup.java:136)

org.apache.drill.exec.test.generated.PriorityQueuedCopierGen975.next(PriorityQueueCopierTemplate.java:76)

org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.mergeAndSpill(ExternalSortBatch.java:557)


I think we were close to have the query completed, In the Fragment Profiles
Web UI, the 2 bottom major fragment (out of 5) were showing that they were
done.
I had the same query working on a (20x) smaller set of data.
Should I add more mem to planner.memory.max_query_memory_per_node ?



Abdel:
We did get the memory leak below while doing streaming aggregation, when
our /tmp directory was too small.
After fixing that, our streaming  aggregation got us the error above.

Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query.
Memory leaked: (389120)
Allocator(op:6:51:2:ExternalSort) 2000000/389120/680576640/715827882
(res/actual/peal/limit)

Fragment 6:51

[Error Id: ..... on node014.prod:31010]


  (java.lan.IllegalStateException) Memory was leaked by query. Memory
leaked (389120)
Allocator(op:6:51:2:ExternalSort) 2000000/389120/680576640/715827882
(res/actual/peal/limit)
    org.apache.drill.exec.memory.BaseAllocator.close():492
    org.apache.drill.exec.ops.OperatorContextImpl.close():124
    org.apache.drill.exec.ops.FragmentContext.supressingClose():416
    org.apache.drill.exec.ops.FragmentContext.close():405

org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():343
    org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():180
    org.apache.drill.exec.work.fragment.FragmentExecutor.run():287
    org.apache.drill.common.SelfCleaningRunnable.run():38
    java.util.concurrentThreadPoolExecutor.runWorker():1142


Thanks guys for your feedback.



On Sat, Mar 12, 2016 at 1:18 AM, Abdel Hakim Deneche <adeneche@maprtech.com>
wrote:

> Disabling hash aggregation will default to streaming aggregation + sort.
> This will allow you to handle larger data and spill to disk if necessary.
>
> Like stated in the documentation, starting from Drill 1.5 the default
> memory limit of sort may not be enough to process large data, but you can
> bump it up by increasing planner.memory.max_query_memory_per_node (defaults
> to 2GB), and if necessary reducing planner.width.max_per_node (defaults to
> 75% of number of cores).
>
> You said disabling hash aggregate and hash join causes a memory leak. Can
> you give more details about the error ? the query may fail with an out of
> memory but it shouldn't leak.
>
> On Fri, Mar 11, 2016 at 10:53 PM, John Omernik <john@omernik.com> wrote:
>
> > I've had some luck disabling multi-phase aggregations on some queries
> where
> > memory was an issue.
> >
> > https://drill.apache.org/docs/guidelines-for-optimizing-aggregation/
> >
> > After I try that, than I typically look at the hash aggregation as you
> have
> > done:
> >
> >
> >
> https://drill.apache.org/docs/sort-based-and-hash-based-memory-constrained-operators/
> >
> > I've had limited success with changing the max_query_memory_per_node and
> > max_width, sometimes it's a weird combination of things that work in
> there.
> >
> > https://drill.apache.org/docs/troubleshooting/#memory-issues
> >
> > Back to your spill stuff if you disable hash aggregation, do you know if
> > your spill directories are setup? That may be part of the issue, I am not
> > sure what the default spill behavior of Drill is for spill directory
> setup.
> >
> >
> >
> > On Fri, Mar 11, 2016 at 2:17 PM, François Méthot <fmethot78@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > >    Using version 1.5, DirectMemory is currently set at 32GB, heap is at
> > > 8GB. We have been trying to perform multiple aggregation in one query
> > (see
> > > below) on 40 Billions+ rows stored on 13 nodes. We are using parquet
> > > format.
> > >
> > > We keep getting OutOfMemoryException: Failure allocating buffer..
> > >
> > > on a query that looks like this:
> > >
> > > create table hdfs.`test1234` as
> > > (
> > > select string_field1,
> > >           string_field2,
> > >           min ( int_field3 ),
> > >           max ( int_field4 ),
> > >           count(1),
> > >           count ( distinct int_field5 ),
> > >           count ( distinct int_field6 ),
> > >           count ( distinct string_field7 )
> > >     from hdfs.`/data/`
> > >     group by string_field1, string_field2;
> > > )
> > >
> > > The documentation state:
> > > "Currently, hash-based operations do not spill to disk as needed."
> > >
> > > and
> > >
> > > "If the hash-based operators run out of memory during execution, the
> > query
> > > fails. If large hash operations do not fit in memory on your system,
> you
> > > can disable these operations. When disabled, Drill creates alternative
> > > plans that allow spilling to disk."
> > >
> > > My understanding is that it will fall back to Streaming aggregation,
> > which
> > > required sorting..
> > >
> > > but
> > >
> > > "As of Drill 1.5, ... the sort operator (in queries that ran
> successfully
> > > in previous releases) may not have enough memory, resulting in a failed
> > > query"
> > >
> > > And Indeed, disabling hash agg and hash join resulted in memory leak
> > error.
> > >
> > > So it looks like increasing direct memory our only option.
> > >
> > > Is there a plan to have Hash Aggregation to spill on disk in the next
> > > release?
> > >
> > >
> > > Thanks for your feedback
> > >
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message