drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From François Méthot <fmetho...@gmail.com>
Subject Re: Aggregation OutOfMemoryException
Date Wed, 30 Mar 2016 12:48:41 GMT
Thank you Abdel,

 After following your recommendation, our query actually went through. From
the Fragments Profiles Overview in the Web UI, we saw that after 2 days of
processing,  a thread was still doing some work. We let it on for another 3
days. Nothing was happening, then we hit ctrl-c from the console that had
initiated the query. And it returned successfully.

Francois


On Wed, Mar 16, 2016 at 1:30 PM, Abdel Hakim Deneche <adeneche@maprtech.com>
wrote:

> actually:
>
> sort limit = MQMPN / (NS * MPN * NC * 0.7)
>
> On Wed, Mar 16, 2016 at 6:30 PM, Abdel Hakim Deneche <
> adeneche@maprtech.com>
> wrote:
>
> > sort memory limit is computed, as follows:
> >
> > MQMPN = planner.memory.max_query_memory_per_node
> > MPN = planner.width.max_per_node
> > NC = number of core in each cluster node
> > NS = number of sort operators in the query
> >
> > sort limit = MQMPN / (MPN * NC * 0.7)
> >
> > In your case I assume the query contains a single sort operator and you
> > have 16 cores per node. To increase the sort limit you can increase the
> > value of max_query_memory_per_node and you can also reduce the value of
> > planner.width.max_per_node. Please note that reducing the value of the
> > latter option may increase the query's execution time.
> >
> > On Wed, Mar 16, 2016 at 2:47 PM, François Méthot <fmethot78@gmail.com>
> > wrote:
> >
> >> The default spill directory (/tmp) did not have enough space. We fixed
> >> that. (thanks John)
> >>
> >> I altered session to set
> >> planner.memory.max_query_memory_per_node = 17179869184 (16GB)
> >> planner.enable_hashjoin=false;
> >> planner.enable_hashadd=false;
> >>
> >> We ran our aggregation.
> >>
> >> After 7h44m.
> >>
> >> We got
> >>
> >> Error: RESOURCE ERROR: External Sort encountered an error while spilling
> >> to
> >> disk
> >>
> >> Fragment 7:35
> >>
> >> Caused by org.apache.drill.exec.exception.OutOfMemory: Unable to
> allocate
> >> buffer of size 65536 (rounded from 37444) due to memory limit. Current
> >> allocation: 681080448.
> >>
> >>
> org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:216)
> >>
> >>
> org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:191)
> >>
> >>
> >>
> org.apache.drill.exec.cache.VectorAccessibleSerializable.readFromStream(VectorAccessibleSerializable.java:112)
> >>
> >>
> >>
> org.apache.drill.exec.physical.impl.xsort.BatchGroup.getBatch(BatchGroup.java:110)
> >>
> >>
> >>
> org.apache.drill.exec.physical.impl.xsort.BatchGroup.getNextIndex(BatchGroup.java:136)
> >>
> >>
> >>
> org.apache.drill.exec.test.generated.PriorityQueuedCopierGen975.next(PriorityQueueCopierTemplate.java:76)
> >>
> >>
> >>
> org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.mergeAndSpill(ExternalSortBatch.java:557)
> >>
> >>
> >> I think we were close to have the query completed, In the Fragment
> >> Profiles
> >> Web UI, the 2 bottom major fragment (out of 5) were showing that they
> were
> >> done.
> >> I had the same query working on a (20x) smaller set of data.
> >> Should I add more mem to planner.memory.max_query_memory_per_node ?
> >>
> >>
> >>
> >> Abdel:
> >> We did get the memory leak below while doing streaming aggregation, when
> >> our /tmp directory was too small.
> >> After fixing that, our streaming  aggregation got us the error above.
> >>
> >> Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query.
> >> Memory leaked: (389120)
> >> Allocator(op:6:51:2:ExternalSort) 2000000/389120/680576640/715827882
> >> (res/actual/peal/limit)
> >>
> >> Fragment 6:51
> >>
> >> [Error Id: ..... on node014.prod:31010]
> >>
> >>
> >>   (java.lan.IllegalStateException) Memory was leaked by query. Memory
> >> leaked (389120)
> >> Allocator(op:6:51:2:ExternalSort) 2000000/389120/680576640/715827882
> >> (res/actual/peal/limit)
> >>     org.apache.drill.exec.memory.BaseAllocator.close():492
> >>     org.apache.drill.exec.ops.OperatorContextImpl.close():124
> >>     org.apache.drill.exec.ops.FragmentContext.supressingClose():416
> >>     org.apache.drill.exec.ops.FragmentContext.close():405
> >>
> >>
> >>
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():343
> >>     org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():180
> >>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():287
> >>     org.apache.drill.common.SelfCleaningRunnable.run():38
> >>     java.util.concurrentThreadPoolExecutor.runWorker():1142
> >>
> >>
> >> Thanks guys for your feedback.
> >>
> >>
> >>
> >> On Sat, Mar 12, 2016 at 1:18 AM, Abdel Hakim Deneche <
> >> adeneche@maprtech.com>
> >> wrote:
> >>
> >> > Disabling hash aggregation will default to streaming aggregation +
> sort.
> >> > This will allow you to handle larger data and spill to disk if
> >> necessary.
> >> >
> >> > Like stated in the documentation, starting from Drill 1.5 the default
> >> > memory limit of sort may not be enough to process large data, but you
> >> can
> >> > bump it up by increasing planner.memory.max_query_memory_per_node
> >> (defaults
> >> > to 2GB), and if necessary reducing planner.width.max_per_node
> (defaults
> >> to
> >> > 75% of number of cores).
> >> >
> >> > You said disabling hash aggregate and hash join causes a memory leak.
> >> Can
> >> > you give more details about the error ? the query may fail with an out
> >> of
> >> > memory but it shouldn't leak.
> >> >
> >> > On Fri, Mar 11, 2016 at 10:53 PM, John Omernik <john@omernik.com>
> >> wrote:
> >> >
> >> > > I've had some luck disabling multi-phase aggregations on some
> queries
> >> > where
> >> > > memory was an issue.
> >> > >
> >> > >
> https://drill.apache.org/docs/guidelines-for-optimizing-aggregation/
> >> > >
> >> > > After I try that, than I typically look at the hash aggregation as
> you
> >> > have
> >> > > done:
> >> > >
> >> > >
> >> > >
> >> >
> >>
> https://drill.apache.org/docs/sort-based-and-hash-based-memory-constrained-operators/
> >> > >
> >> > > I've had limited success with changing the max_query_memory_per_node
> >> and
> >> > > max_width, sometimes it's a weird combination of things that work
in
> >> > there.
> >> > >
> >> > > https://drill.apache.org/docs/troubleshooting/#memory-issues
> >> > >
> >> > > Back to your spill stuff if you disable hash aggregation, do you
> know
> >> if
> >> > > your spill directories are setup? That may be part of the issue, I
> am
> >> not
> >> > > sure what the default spill behavior of Drill is for spill directory
> >> > setup.
> >> > >
> >> > >
> >> > >
> >> > > On Fri, Mar 11, 2016 at 2:17 PM, François Méthot <
> fmethot78@gmail.com
> >> >
> >> > > wrote:
> >> > >
> >> > > > Hi,
> >> > > >
> >> > > >    Using version 1.5, DirectMemory is currently set at 32GB,
heap
> >> is at
> >> > > > 8GB. We have been trying to perform multiple aggregation in one
> >> query
> >> > > (see
> >> > > > below) on 40 Billions+ rows stored on 13 nodes. We are using
> parquet
> >> > > > format.
> >> > > >
> >> > > > We keep getting OutOfMemoryException: Failure allocating buffer..
> >> > > >
> >> > > > on a query that looks like this:
> >> > > >
> >> > > > create table hdfs.`test1234` as
> >> > > > (
> >> > > > select string_field1,
> >> > > >           string_field2,
> >> > > >           min ( int_field3 ),
> >> > > >           max ( int_field4 ),
> >> > > >           count(1),
> >> > > >           count ( distinct int_field5 ),
> >> > > >           count ( distinct int_field6 ),
> >> > > >           count ( distinct string_field7 )
> >> > > >     from hdfs.`/data/`
> >> > > >     group by string_field1, string_field2;
> >> > > > )
> >> > > >
> >> > > > The documentation state:
> >> > > > "Currently, hash-based operations do not spill to disk as needed."
> >> > > >
> >> > > > and
> >> > > >
> >> > > > "If the hash-based operators run out of memory during execution,
> the
> >> > > query
> >> > > > fails. If large hash operations do not fit in memory on your
> system,
> >> > you
> >> > > > can disable these operations. When disabled, Drill creates
> >> alternative
> >> > > > plans that allow spilling to disk."
> >> > > >
> >> > > > My understanding is that it will fall back to Streaming
> aggregation,
> >> > > which
> >> > > > required sorting..
> >> > > >
> >> > > > but
> >> > > >
> >> > > > "As of Drill 1.5, ... the sort operator (in queries that ran
> >> > successfully
> >> > > > in previous releases) may not have enough memory, resulting in
a
> >> failed
> >> > > > query"
> >> > > >
> >> > > > And Indeed, disabling hash agg and hash join resulted in memory
> leak
> >> > > error.
> >> > > >
> >> > > > So it looks like increasing direct memory our only option.
> >> > > >
> >> > > > Is there a plan to have Hash Aggregation to spill on disk in
the
> >> next
> >> > > > release?
> >> > > >
> >> > > >
> >> > > > Thanks for your feedback
> >> > > >
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> >
> >> > Abdelhakim Deneche
> >> >
> >> > Software Engineer
> >> >
> >> >   <http://www.mapr.com/>
> >> >
> >> >
> >> > Now Available - Free Hadoop On-Demand Training
> >> > <
> >> >
> >>
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >> > >
> >> >
> >>
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message