drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abdel Hakim Deneche <adene...@maprtech.com>
Subject Re: ExternalSort doesn't properly account for sliced buffers
Date Mon, 23 Nov 2015 17:08:10 GMT
I create DRILL-4121 <https://issues.apache.org/jira/browse/DRILL-4121> to
track this issue

On Fri, Nov 20, 2015 at 5:10 PM, Hanifi Gunes <hgunes@maprtech.com> wrote:

> @Abdel back to your question
>
> *Is there a specific reason we only account for root buffers ?*
>
> My understanding was that we'd like to avoid double counting as slices are
> zero copy at any point you'd either account root buffer or slices but not
> both together. That being said, however, I do not see any reason why a
> vector would internally slice its underlying buffer except an invocation to
> VV#load in which case the vector is given a non-root buffer as well. So
> making this check on root buffer-ness seems unneeded to me since
> VV#getBuffers() should never return a root buffer and a slice from the same
> buffer. I would like to be proven wrong though.
>
>
> I guess, perhaps, Steven is referring to previous implementation of
> DrillBuf but returning slice width rather than root buffer width seems
> conforming to ByteBuf#capacity(), which is the current code anyway.
>
> On Fri, Nov 20, 2015 at 4:26 PM, Abdel Hakim Deneche <
> adeneche@maprtech.com>
> wrote:
>
> > I'm confused here. Looking at DrillBuf.capacity() it already returns the
> > length of the slice:
> >
> >   @Override
> > >   public int capacity() {
> > >     return length;
> > >   }
> >
> >
> >
> >
> > On Fri, Nov 20, 2015 at 4:11 PM, Hanifi GUNES <hanifigunes@gmail.com>
> > wrote:
> >
> > > Seems like I am missing some input here. Are we talking about changing
> > the
> > > behavior of DrillBuf#capacity()? If so, I would +1 the following:
> > >
> > > *- It seems like capacity should return the length of the slice (since
> > > I believe that fits with the general ByteBuf interface). If I
> > > reset writerIndex(0), I should be able write to capacity() without
> > issue.*
> > >
> > > 2015-11-20 16:03 GMT-08:00 Jacques Nadeau <jacques@dremio.com>:
> > >
> > > > My gut is that any changes other than correct accounting will have
> > little
> > > > impact on real world situations. Do you think the change will make
> > enough
> > > > difference to be valuable?
> > > >
> > > > It seems like capacity should return the length of the slice (since I
> > > > believe that fits with the general ByteBuf interface). If I reset
> > > > writerIndex(0), I should be able write to capacity() without issue.
> > Seems
> > > > weird to return some other (mostly disconnect) value.
> > > >
> > > >
> > > >
> > > > --
> > > > Jacques Nadeau
> > > > CTO and Co-Founder, Dremio
> > > >
> > > > On Fri, Nov 20, 2015 at 3:28 PM, Abdel Hakim Deneche <
> > > > adeneche@maprtech.com>
> > > > wrote:
> > > >
> > > > > @Steven, I think DrillBuf.capacity() was changed at some point, I
> was
> > > > > looking at the code and it seems to only return the size of the
> > "slice"
> > > > and
> > > > > not the root buffer.
> > > > >
> > > > > While waiting for the new allocator and transfer of ownership,
> would
> > it
> > > > > make sense to remove the check for root buffers like this ?
> > > > >
> > > > >   private long getBufferSize(VectorAccessible batch) {
> > > > > >     long size = 0;
> > > > > >     for (VectorWrapper<?> w : batch) {
> > > > > >       DrillBuf[] bufs = w.getValueVector().getBuffers(false);
> > > > > >       for (DrillBuf buf : bufs) {
> > > > > >         size += buf.capacity();
> > > > > >       }
> > > > > >     }
> > > > > >     return size;
> > > > > >   }
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Nov 20, 2015 at 2:50 PM, Hanifi GUNES <
> hanifigunes@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Problem with the above code is that not all vectors operate
on
> root
> > > > > buffers
> > > > > > rendering the accounting above inaccurate. In fact your example
> is
> > > one
> > > > > > perfect instance where vectors would point to non-root buffers
> for
> > > sure
> > > > > > because of the slicing taking place at RecordBatchLoader#load
[1]
> > > > > >
> > > > > >
> > > > > > 1:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchLoader.java#L117
> > > > > >
> > > > > > 2015-11-20 11:41 GMT-08:00 Steven Phillips <steven@dremio.com>:
> > > > > >
> > > > > > > I think it is because we can't actually properly account
for
> > sliced
> > > > > > > buffers. I don't remember for sure, but I think it might
be
> > because
> > > > > > calling
> > > > > > > buf.capacity() on a sliced buffer returns the the capacity
of
> > root
> > > > > > buffer,
> > > > > > > not the size of the slice. That may not be correct, but
I think
> > it
> > > > was
> > > > > > > something like that. Whatever it is, I am pretty sure it
was
> > giving
> > > > > wrong
> > > > > > > results when they are sliced buffers.
> > > > > > >
> > > > > > > I think we need to get the new allocator, along with proper
> > > transfer
> > > > of
> > > > > > > ownership in order to do this correctly. Then we can just
query
> > the
> > > > > > > allocator rather than trying to track it separately.
> > > > > > >
> > > > > > > On Fri, Nov 20, 2015 at 11:25 AM, Abdel Hakim Deneche <
> > > > > > > adeneche@maprtech.com
> > > > > > > > wrote:
> > > > > > >
> > > > > > > > I'm looking at the external sort code and it uses
the
> following
> > > > > method
> > > > > > to
> > > > > > > > compute the allocated size of a batch:
> > > > > > > >
> > > > > > > >   private long getBufferSize(VectorAccessible batch)
{
> > > > > > > > >     long size = 0;
> > > > > > > > >     for (VectorWrapper<?> w : batch) {
> > > > > > > > >       DrillBuf[] bufs =
> w.getValueVector().getBuffers(false);
> > > > > > > > >       for (DrillBuf buf : bufs) {
> > > > > > > > >         if (*buf.isRootBuffer()*) {
> > > > > > > > >           size += buf.capacity();
> > > > > > > > >         }
> > > > > > > > >       }
> > > > > > > > >     }
> > > > > > > > >     return size;
> > > > > > > > >   }
> > > > > > > >
> > > > > > > >
> > > > > > > > This method only accounts for root buffers, but when
we have
> a
> > > > > receiver
> > > > > > > > below the sort, most of (if not all) buffers are child
> buffers.
> > > > This
> > > > > > may
> > > > > > > > delay spilling, and increase the memory usage of the
> drillbit.
> > If
> > > > my
> > > > > > > > computations are correct, for a single query, one
drillbit
> can
> > > > > allocate
> > > > > > > up
> > > > > > > > to 40GB without spilling once to disk.
> > > > > > > >
> > > > > > > > Is there a specific reason we only account for root
buffers ?
> > > > > > > >
> > > > > > > > --
> > > > > > > >
> > > > > > > > Abdelhakim Deneche
> > > > > > > >
> > > > > > > > Software Engineer
> > > > > > > >
> > > > > > > >   <http://www.mapr.com/>
> > > > > > > >
> > > > > > > >
> > > > > > > > Now Available - Free Hadoop On-Demand Training
> > > > > > > > <
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Abdelhakim Deneche
> > > > >
> > > > > Software Engineer
> > > > >
> > > > >   <http://www.mapr.com/>
> > > > >
> > > > >
> > > > > Now Available - Free Hadoop On-Demand Training
> > > > > <
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> >
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message