drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit Hadke <amit.ha...@gmail.com>
Subject Re: Question about the RecordIterator
Date Tue, 15 Dec 2015 20:02:51 GMT
Yup that may be it. I'll add an option to not hold on to left side iterator
batches.

On Tue, Dec 15, 2015 at 11:56 AM, Abdel Hakim Deneche <adeneche@maprtech.com
> wrote:

> RecordIterator.mark() is only called for the right side of the merge join.
> How about the left side, de we ever release the batches on the left side ?
> In 4190 the sort that runs out of memory is on the left side of the merge.
>
> On Tue, Dec 15, 2015 at 11:51 AM, Abdel Hakim Deneche <
> adeneche@maprtech.com
> > wrote:
>
> > I see, it's in RecordIterator.mark()
> >
> > On Tue, Dec 15, 2015 at 11:50 AM, Abdel Hakim Deneche <
> > adeneche@maprtech.com> wrote:
> >
> >> Amit,
> >>
> >> thanks for the prompt answer. Can you point me, in the code, where the
> >> purge is done ?
> >>
> >>
> >>
> >> On Tue, Dec 15, 2015 at 11:42 AM, Amit Hadke <amit.hadke@gmail.com>
> >> wrote:
> >>
> >>> Hi Hakim,
> >>> RecordIterator will not hold all batches in memory. It holds batches
> from
> >>> last mark() operation.
> >>> It will purge batches as join moves along.
> >>>
> >>> Worst case case is when there are lots of repeating values on right
> side
> >>> which iterator will hold in memory.
> >>>
> >>> ~ Amit.
> >>>
> >>> On Tue, Dec 15, 2015 at 11:23 AM, Abdel Hakim Deneche <
> >>> adeneche@maprtech.com
> >>> > wrote:
> >>>
> >>> > Amit,
> >>> >
> >>> > I am looking at DRILL-4190 where one of the sort operators is hitting
> >>> it's
> >>> > allocator limit when it's sending data downstream. This generally
> >>> happen
> >>> > when a downstream operator is holding those batches in memory (e.g.
> >>> Window
> >>> > Operator).
> >>> >
> >>> > The same query is running fine on 1.2.0 which seems to suggest that
> the
> >>> > recent changes to MergeJoinBatch "may" be causing the issue.
> >>> >
> >>> > It looks like RecordIterator is holding all incoming batches into a
> >>> > TreeRangeMap and if I'm not mistaken it doesn't release anything
> until
> >>> it's
> >>> > closed. Is this correct ?
> >>> >
> >>> > I am not familiar with how merge join used to work before
> >>> RecordIterator.
> >>> > Was it also the case that we hold all incoming batches in memory ?
> >>> >
> >>> > Thanks
> >>> >
> >>> > --
> >>> >
> >>> > Abdelhakim Deneche
> >>> >
> >>> > Software Engineer
> >>> >
> >>> >   <http://www.mapr.com/>
> >>> >
> >>> >
> >>> > Now Available - Free Hadoop On-Demand Training
> >>> > <
> >>> >
> >>>
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >>> > >
> >>> >
> >>>
> >>
> >>
> >>
> >> --
> >>
> >> Abdelhakim Deneche
> >>
> >> Software Engineer
> >>
> >>   <http://www.mapr.com/>
> >>
> >>
> >> Now Available - Free Hadoop On-Demand Training
> >> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
> >>
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message