drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abdel Hakim Deneche <adene...@maprtech.com>
Subject Re: Aggregation OutOfMemoryException
Date Wed, 16 Mar 2016 17:30:35 GMT
sort memory limit is computed, as follows:

MQMPN = planner.memory.max_query_memory_per_node
MPN = planner.width.max_per_node
NC = number of core in each cluster node
NS = number of sort operators in the query

sort limit = MQMPN / (MPN * NC * 0.7)

In your case I assume the query contains a single sort operator and you
have 16 cores per node. To increase the sort limit you can increase the
value of max_query_memory_per_node and you can also reduce the value of
planner.width.max_per_node. Please note that reducing the value of the
latter option may increase the query's execution time.

On Wed, Mar 16, 2016 at 2:47 PM, François Méthot <fmethot78@gmail.com>
wrote:

> The default spill directory (/tmp) did not have enough space. We fixed
> that. (thanks John)
>
> I altered session to set
> planner.memory.max_query_memory_per_node = 17179869184 (16GB)
> planner.enable_hashjoin=false;
> planner.enable_hashadd=false;
>
> We ran our aggregation.
>
> After 7h44m.
>
> We got
>
> Error: RESOURCE ERROR: External Sort encountered an error while spilling to
> disk
>
> Fragment 7:35
>
> Caused by org.apache.drill.exec.exception.OutOfMemory: Unable to allocate
> buffer of size 65536 (rounded from 37444) due to memory limit. Current
> allocation: 681080448.
>
> org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:216)
>
> org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:191)
>
>
> org.apache.drill.exec.cache.VectorAccessibleSerializable.readFromStream(VectorAccessibleSerializable.java:112)
>
>
> org.apache.drill.exec.physical.impl.xsort.BatchGroup.getBatch(BatchGroup.java:110)
>
>
> org.apache.drill.exec.physical.impl.xsort.BatchGroup.getNextIndex(BatchGroup.java:136)
>
>
> org.apache.drill.exec.test.generated.PriorityQueuedCopierGen975.next(PriorityQueueCopierTemplate.java:76)
>
>
> org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.mergeAndSpill(ExternalSortBatch.java:557)
>
>
> I think we were close to have the query completed, In the Fragment Profiles
> Web UI, the 2 bottom major fragment (out of 5) were showing that they were
> done.
> I had the same query working on a (20x) smaller set of data.
> Should I add more mem to planner.memory.max_query_memory_per_node ?
>
>
>
> Abdel:
> We did get the memory leak below while doing streaming aggregation, when
> our /tmp directory was too small.
> After fixing that, our streaming  aggregation got us the error above.
>
> Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query.
> Memory leaked: (389120)
> Allocator(op:6:51:2:ExternalSort) 2000000/389120/680576640/715827882
> (res/actual/peal/limit)
>
> Fragment 6:51
>
> [Error Id: ..... on node014.prod:31010]
>
>
>   (java.lan.IllegalStateException) Memory was leaked by query. Memory
> leaked (389120)
> Allocator(op:6:51:2:ExternalSort) 2000000/389120/680576640/715827882
> (res/actual/peal/limit)
>     org.apache.drill.exec.memory.BaseAllocator.close():492
>     org.apache.drill.exec.ops.OperatorContextImpl.close():124
>     org.apache.drill.exec.ops.FragmentContext.supressingClose():416
>     org.apache.drill.exec.ops.FragmentContext.close():405
>
>
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():343
>     org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():180
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():287
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrentThreadPoolExecutor.runWorker():1142
>
>
> Thanks guys for your feedback.
>
>
>
> On Sat, Mar 12, 2016 at 1:18 AM, Abdel Hakim Deneche <
> adeneche@maprtech.com>
> wrote:
>
> > Disabling hash aggregation will default to streaming aggregation + sort.
> > This will allow you to handle larger data and spill to disk if necessary.
> >
> > Like stated in the documentation, starting from Drill 1.5 the default
> > memory limit of sort may not be enough to process large data, but you can
> > bump it up by increasing planner.memory.max_query_memory_per_node
> (defaults
> > to 2GB), and if necessary reducing planner.width.max_per_node (defaults
> to
> > 75% of number of cores).
> >
> > You said disabling hash aggregate and hash join causes a memory leak. Can
> > you give more details about the error ? the query may fail with an out of
> > memory but it shouldn't leak.
> >
> > On Fri, Mar 11, 2016 at 10:53 PM, John Omernik <john@omernik.com> wrote:
> >
> > > I've had some luck disabling multi-phase aggregations on some queries
> > where
> > > memory was an issue.
> > >
> > > https://drill.apache.org/docs/guidelines-for-optimizing-aggregation/
> > >
> > > After I try that, than I typically look at the hash aggregation as you
> > have
> > > done:
> > >
> > >
> > >
> >
> https://drill.apache.org/docs/sort-based-and-hash-based-memory-constrained-operators/
> > >
> > > I've had limited success with changing the max_query_memory_per_node
> and
> > > max_width, sometimes it's a weird combination of things that work in
> > there.
> > >
> > > https://drill.apache.org/docs/troubleshooting/#memory-issues
> > >
> > > Back to your spill stuff if you disable hash aggregation, do you know
> if
> > > your spill directories are setup? That may be part of the issue, I am
> not
> > > sure what the default spill behavior of Drill is for spill directory
> > setup.
> > >
> > >
> > >
> > > On Fri, Mar 11, 2016 at 2:17 PM, François Méthot <fmethot78@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > >    Using version 1.5, DirectMemory is currently set at 32GB, heap is
> at
> > > > 8GB. We have been trying to perform multiple aggregation in one query
> > > (see
> > > > below) on 40 Billions+ rows stored on 13 nodes. We are using parquet
> > > > format.
> > > >
> > > > We keep getting OutOfMemoryException: Failure allocating buffer..
> > > >
> > > > on a query that looks like this:
> > > >
> > > > create table hdfs.`test1234` as
> > > > (
> > > > select string_field1,
> > > >           string_field2,
> > > >           min ( int_field3 ),
> > > >           max ( int_field4 ),
> > > >           count(1),
> > > >           count ( distinct int_field5 ),
> > > >           count ( distinct int_field6 ),
> > > >           count ( distinct string_field7 )
> > > >     from hdfs.`/data/`
> > > >     group by string_field1, string_field2;
> > > > )
> > > >
> > > > The documentation state:
> > > > "Currently, hash-based operations do not spill to disk as needed."
> > > >
> > > > and
> > > >
> > > > "If the hash-based operators run out of memory during execution, the
> > > query
> > > > fails. If large hash operations do not fit in memory on your system,
> > you
> > > > can disable these operations. When disabled, Drill creates
> alternative
> > > > plans that allow spilling to disk."
> > > >
> > > > My understanding is that it will fall back to Streaming aggregation,
> > > which
> > > > required sorting..
> > > >
> > > > but
> > > >
> > > > "As of Drill 1.5, ... the sort operator (in queries that ran
> > successfully
> > > > in previous releases) may not have enough memory, resulting in a
> failed
> > > > query"
> > > >
> > > > And Indeed, disabling hash agg and hash join resulted in memory leak
> > > error.
> > > >
> > > > So it looks like increasing direct memory our only option.
> > > >
> > > > Is there a plan to have Hash Aggregation to spill on disk in the next
> > > > release?
> > > >
> > > >
> > > > Thanks for your feedback
> > > >
> > >
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> >
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message