drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hanifi GUNES <hanifigu...@gmail.com>
Subject Re: ExternalSort doesn't properly account for sliced buffers
Date Fri, 20 Nov 2015 22:50:47 GMT
Problem with the above code is that not all vectors operate on root buffers
rendering the accounting above inaccurate. In fact your example is one
perfect instance where vectors would point to non-root buffers for sure
because of the slicing taking place at RecordBatchLoader#load [1]


1:
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchLoader.java#L117

2015-11-20 11:41 GMT-08:00 Steven Phillips <steven@dremio.com>:

> I think it is because we can't actually properly account for sliced
> buffers. I don't remember for sure, but I think it might be because calling
> buf.capacity() on a sliced buffer returns the the capacity of root buffer,
> not the size of the slice. That may not be correct, but I think it was
> something like that. Whatever it is, I am pretty sure it was giving wrong
> results when they are sliced buffers.
>
> I think we need to get the new allocator, along with proper transfer of
> ownership in order to do this correctly. Then we can just query the
> allocator rather than trying to track it separately.
>
> On Fri, Nov 20, 2015 at 11:25 AM, Abdel Hakim Deneche <
> adeneche@maprtech.com
> > wrote:
>
> > I'm looking at the external sort code and it uses the following method to
> > compute the allocated size of a batch:
> >
> >   private long getBufferSize(VectorAccessible batch) {
> > >     long size = 0;
> > >     for (VectorWrapper<?> w : batch) {
> > >       DrillBuf[] bufs = w.getValueVector().getBuffers(false);
> > >       for (DrillBuf buf : bufs) {
> > >         if (*buf.isRootBuffer()*) {
> > >           size += buf.capacity();
> > >         }
> > >       }
> > >     }
> > >     return size;
> > >   }
> >
> >
> > This method only accounts for root buffers, but when we have a receiver
> > below the sort, most of (if not all) buffers are child buffers. This may
> > delay spilling, and increase the memory usage of the drillbit. If my
> > computations are correct, for a single query, one drillbit can allocate
> up
> > to 40GB without spilling once to disk.
> >
> > Is there a specific reason we only account for root buffers ?
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message