drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abdel Hakim Deneche <adene...@maprtech.com>
Subject Re: Question about the RecordIterator
Date Tue, 15 Dec 2015 19:56:47 GMT
RecordIterator.mark() is only called for the right side of the merge join.
How about the left side, de we ever release the batches on the left side ?
In 4190 the sort that runs out of memory is on the left side of the merge.

On Tue, Dec 15, 2015 at 11:51 AM, Abdel Hakim Deneche <adeneche@maprtech.com
> wrote:

> I see, it's in RecordIterator.mark()
>
> On Tue, Dec 15, 2015 at 11:50 AM, Abdel Hakim Deneche <
> adeneche@maprtech.com> wrote:
>
>> Amit,
>>
>> thanks for the prompt answer. Can you point me, in the code, where the
>> purge is done ?
>>
>>
>>
>> On Tue, Dec 15, 2015 at 11:42 AM, Amit Hadke <amit.hadke@gmail.com>
>> wrote:
>>
>>> Hi Hakim,
>>> RecordIterator will not hold all batches in memory. It holds batches from
>>> last mark() operation.
>>> It will purge batches as join moves along.
>>>
>>> Worst case case is when there are lots of repeating values on right side
>>> which iterator will hold in memory.
>>>
>>> ~ Amit.
>>>
>>> On Tue, Dec 15, 2015 at 11:23 AM, Abdel Hakim Deneche <
>>> adeneche@maprtech.com
>>> > wrote:
>>>
>>> > Amit,
>>> >
>>> > I am looking at DRILL-4190 where one of the sort operators is hitting
>>> it's
>>> > allocator limit when it's sending data downstream. This generally
>>> happen
>>> > when a downstream operator is holding those batches in memory (e.g.
>>> Window
>>> > Operator).
>>> >
>>> > The same query is running fine on 1.2.0 which seems to suggest that the
>>> > recent changes to MergeJoinBatch "may" be causing the issue.
>>> >
>>> > It looks like RecordIterator is holding all incoming batches into a
>>> > TreeRangeMap and if I'm not mistaken it doesn't release anything until
>>> it's
>>> > closed. Is this correct ?
>>> >
>>> > I am not familiar with how merge join used to work before
>>> RecordIterator.
>>> > Was it also the case that we hold all incoming batches in memory ?
>>> >
>>> > Thanks
>>> >
>>> > --
>>> >
>>> > Abdelhakim Deneche
>>> >
>>> > Software Engineer
>>> >
>>> >   <http://www.mapr.com/>
>>> >
>>> >
>>> > Now Available - Free Hadoop On-Demand Training
>>> > <
>>> >
>>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>>> > >
>>> >
>>>
>>
>>
>>
>> --
>>
>> Abdelhakim Deneche
>>
>> Software Engineer
>>
>>   <http://www.mapr.com/>
>>
>>
>> Now Available - Free Hadoop On-Demand Training
>> <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>
>>
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message