drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abdel Hakim Deneche <adene...@maprtech.com>
Subject Re: Out Of Memory Error (Possible Regression)
Date Wed, 30 Dec 2015 21:32:00 GMT
That must be it.

On Wed, Dec 30, 2015 at 1:22 PM, Steven Phillips <steven@dremio.com> wrote:

> No, we are running on 4 4-core machines.
>
> On Wed, Dec 30, 2015 at 2:10 PM, Abdel Hakim Deneche <
> adeneche@maprtech.com>
> wrote:
>
> > are you running the tests on 32 core machines ? a different number of
> cores
> > affects how much memory is available for the sort
> >
> > On Wed, Dec 30, 2015 at 1:02 PM, Abdel Hakim Deneche <
> > adeneche@maprtech.com>
> > wrote:
> >
> > > The following tests are failing:
> > >
> > >
> > >>
> >
> /Functional/data-shapes/wide-columns/5000/1000rows/parquet/q163_DRILL-2046.q
> > >>
> > >>
> >
> /Functional/data-shapes/wide-columns/5000/1000rows/parquet/q177_DRILL-2046.q
> > >> /Functional/data-shapes/wide-columns/5000/1000rows/parquet/q174.q
> > >> /Functional/data-shapes/wide-columns/5000/1000rows/parquet/
> > >> /Functional/window_functions/multiple_partitions/q35.sql
> > >>
> > >>
> >
> /Functional/data-shapes/wide-columns/5000/1000rows/parquet/q160_DRILL-1985.q
> > >>
> > >>
> >
> /Functional/data-shapes/wide-columns/5000/1000rows/parquet/q162_DRILL-1985.q
> > >> /Functional/data-shapes/wide-columns/5000/1000rows/parquet/q165.q
> > >> /Functional/window_functions/multiple_partitions/q37.sql
> > >> /Functional/data-shapes/wide-columns/5000/1000rows/parquet/q171.q
> > >>
> > >>
> >
> /Functional/data-shapes/wide-columns/5000/1000rows/parquet/q168_DRILL-2046.q
> > >> /Functional/window_functions/multiple_partitions/q36.sql
> > >>
> > >>
> >
> /Functional/data-shapes/wide-columns/5000/1000rows/parquet/q159_DRILL-2046.q
> > >> /Functional/window_functions/multiple_partitions/q30.sql
> > >>
> > >>
> >
> /Functional/data-shapes/wide-columns/5000/1000rows/parquet/large/q157_DRILL-1985.q
> > >> /Functional/window_functions/multiple_partitions/q22.sql
> > >
> > >
> > > With one of the following errors:
> > >
> > > java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of
> > memory
> > >> while executing the query.
> > >> Caused by: org.apache.drill.exec.exception.OutOfMemoryException:
> > >> org.apache.drill.exec.exception.OutOfMemoryException: Unable to
> allocate
> > >> sv2, and not enough batchGroups to spill
> > >>         at
> > >>
> >
> org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:356)
> > >> ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> > >
> > >
> > > or
> > >
> > > java.sql.SQLException: SYSTEM ERROR: DrillRuntimeException: Failed to
> > >> pre-allocate memory for SV. Existing recordCount*4 = 0, incoming batch
> > >> recordCount*4 = 3340
> > >> Caused by: org.apache.drill.common.exceptions.DrillRuntimeException:
> > >> Failed to pre-allocate memory for SV. Existing recordCount*4 = 0,
> > incoming
> > >> batch recordCount*4 = 3340
> > >>         at
> > >>
> >
> org.apache.drill.exec.physical.impl.sort.SortRecordBatchBuilder.add(SortRecordBatchBuilder.java:116)
> > >> ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> > >>         at
> > >>
> >
> org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:451)
> > >> ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> > >
> > >
> > >
> > >
> > > On Wed, Dec 30, 2015 at 12:42 PM, Jacques Nadeau <jacques@dremio.com>
> > > wrote:
> > >
> > >> I'll let Steven answer your question directly.
> > >>
> > >> FYI, we are running a regression suite that was forked from the MapR
> > repo
> > >> a
> > >> month or so ago because we had to fix a bunch of things to make it
> work
> > >> with Apache Hadoop. (There was a thread about this back then and we
> > >> haven't
> > >> yet figured out how to merge both suites.) It is possible that he had
> a
> > >> successful run but the failures are happening on items that you've
> > >> recently
> > >> added to your suite.
> > >>
> > >> It is also possible (likely?) that the configuration settings for our
> > >> regression clusters are not the same.
> > >>
> > >> --
> > >> Jacques Nadeau
> > >> CTO and Co-Founder, Dremio
> > >>
> > >> On Wed, Dec 30, 2015 at 12:37 PM, Abdel Hakim Deneche <
> > >> adeneche@maprtech.com
> > >> > wrote:
> > >>
> > >> > Steven,
> > >> >
> > >> > were you able to successfully run the regression tests on the
> transfer
> > >> > patch ? I just tried and saw several queries running out of memory
!
> > >> >
> > >> > On Wed, Dec 30, 2015 at 11:46 AM, Abdel Hakim Deneche <
> > >> > adeneche@maprtech.com
> > >> > > wrote:
> > >> >
> > >> > > Created DRILL-4236 <
> > https://issues.apache.org/jira/browse/DRILL-4236>
> > >> to
> > >> > > keep track of this improvement.
> > >> > >
> > >> > > On Wed, Dec 30, 2015 at 11:01 AM, Jacques Nadeau <
> > jacques@dremio.com>
> > >> > > wrote:
> > >> > >
> > >> > >> Since the accounting changed (more accurate), the termination
> > >> condition
> > >> > >> for
> > >> > >> the sort operator will be different than before. In fact,
this
> > likely
> > >> > will
> > >> > >> be sooner since our accounting is much larger than previously
> > (since
> > >> we
> > >> > >> correctly consider the entire allocation rather than simply
the
> > used
> > >> > >> allocation).
> > >> > >>
> > >> > >> Hakim,
> > >> > >> Steven and I were discussing the need to update the ExternalSort
> > >> > operator
> > >> > >> to use the new allocator functionality to better manage its
> memory
> > >> > >> envelope. Would you be interested in working on this since
you
> seem
> > >> to
> > >> > be
> > >> > >> working with that code the most? Basically, it used to be
that
> > there
> > >> was
> > >> > >> no
> > >> > >> way the sort operator would be able to correctly detect a
memory
> > >> > condition
> > >> > >> and so it jumped through a bunch of hoops to try to figure
out
> the
> > >> > >> termination condition.With the transfer accounting in place,
this
> > >> code
> > >> > can
> > >> > >> be greatly simplified to just use the current operator memory
> > >> > allocation.
> > >> > >>
> > >> > >> --
> > >> > >> Jacques Nadeau
> > >> > >> CTO and Co-Founder, Dremio
> > >> > >>
> > >> > >> On Wed, Dec 30, 2015 at 10:48 AM, rahul challapalli <
> > >> > >> challapallirahul@gmail.com> wrote:
> > >> > >>
> > >> > >> > I installed the latest master and ran this query. So
> > >> > >> > planner.memory.max_query_memory_per_node should have
been the
> > >> default
> > >> > >> > value. I switched back to 1.4.0 branch and this query
completed
> > >> > >> > successfully.
> > >> > >> >
> > >> > >> > On Wed, Dec 30, 2015 at 10:37 AM, Abdel Hakim Deneche
<
> > >> > >> > adeneche@maprtech.com
> > >> > >> > > wrote:
> > >> > >> >
> > >> > >> > > Rahul,
> > >> > >> > >
> > >> > >> > > How much memory was assigned to the sort operator
(
> > >> > >> > > planner.memory.max_query_memory_per_node) ?
> > >> > >> > >
> > >> > >> > > On Wed, Dec 30, 2015 at 9:54 AM, rahul challapalli
<
> > >> > >> > > challapallirahul@gmail.com> wrote:
> > >> > >> > >
> > >> > >> > > > I am seeing an OOM error while executing a
simple CTAS
> > query. I
> > >> > >> raised
> > >> > >> > > > DRILL-4324 for this. The query mentioned in
the JIRA used
> to
> > >> > >> complete
> > >> > >> > > > successfully without any issue prior to 1.5.
Any idea what
> > >> could
> > >> > >> have
> > >> > >> > > > caused the regression?
> > >> > >> > > >
> > >> > >> > > > - Rahul
> > >> > >> > > >
> > >> > >> > >
> > >> > >> > >
> > >> > >> > >
> > >> > >> > > --
> > >> > >> > >
> > >> > >> > > Abdelhakim Deneche
> > >> > >> > >
> > >> > >> > > Software Engineer
> > >> > >> > >
> > >> > >> > >   <http://www.mapr.com/>
> > >> > >> > >
> > >> > >> > >
> > >> > >> > > Now Available - Free Hadoop On-Demand Training
> > >> > >> > > <
> > >> > >> > >
> > >> > >> >
> > >> > >>
> > >> >
> > >>
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >> > >> > > >
> > >> > >> > >
> > >> > >> >
> > >> > >>
> > >> > >
> > >> > >
> > >> > >
> > >> > > --
> > >> > >
> > >> > > Abdelhakim Deneche
> > >> > >
> > >> > > Software Engineer
> > >> > >
> > >> > >   <http://www.mapr.com/>
> > >> > >
> > >> > >
> > >> > > Now Available - Free Hadoop On-Demand Training
> > >> > > <
> > >> >
> > >>
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >> > >
> > >> > >
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> >
> > >> > Abdelhakim Deneche
> > >> >
> > >> > Software Engineer
> > >> >
> > >> >   <http://www.mapr.com/>
> > >> >
> > >> >
> > >> > Now Available - Free Hadoop On-Demand Training
> > >> > <
> > >> >
> > >>
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >> > >
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > >
> > > Abdelhakim Deneche
> > >
> > > Software Engineer
> > >
> > >   <http://www.mapr.com/>
> > >
> > >
> > > Now Available - Free Hadoop On-Demand Training
> > > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> > >
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> >
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message