drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacques Nadeau <jacq...@dremio.com>
Subject Re: zeroVectors() interface for value vectors
Date Tue, 25 Aug 2015 22:41:46 GMT
I think it would be worthwhile to research when this was added and why.  I
took a quick glance but hadn't figured out why this was introduced.  At
first glance, it seems like it was done to fix what is most likely a bug
elsewhere.

@Chris, agreed.  The first goal was to at least get them documented in this
thread.

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Tue, Aug 25, 2015 at 10:46 AM, Abdel Hakim Deneche <adeneche@maprtech.com
> wrote:

> One more question about the transition from allocate -> mutate. For Fixed
> width vectors and BitVector you can actually call setSafe() without calling
> allocateNew() first and it will work. Should it throw an exception instead
> ?
> not calling allocateNew() has side effects that could cause setSafe() to
> throw an OversizedAllocationException if you call setSafe() then clear()
> multiple times.
>
> On Tue, Aug 25, 2015 at 10:01 AM, Chris Westin <chriswestin42@gmail.com>
> wrote:
>
> > Maybe we should start by putting these rules in a comment in the value
> > vector base interfaces? The lack of such information is why there are
> > deviations and other expectations.
> >
> > On Tue, Aug 25, 2015 at 8:22 AM, Jacques Nadeau <jacques@dremio.com>
> > wrote:
> >
> > > There are a few unspoken "rules" around vectors:
> > >
> > > - values need to be written in order (e.g. index 0, 1, 2, 5)
> > > - null vectors start with all values as null before writing anything
> > > - for variable width types, the offset vector should be all zeros
> before
> > > writing
> > > - you must call setValueCount before a vector can be read
> > > - you should never write to a vector once it has been read.
> > >
> > > The ultimate goal we should get to the point where you the interfaces
> > > guarantee this order of operation:
> > >
> > > allocate > mutate > setvaluecount > access > clear (or allocate
to
> start
> > > the process over, xxx).  Any deviation from this pattern should result
> in
> > > exception.  We should do this only in debug mode as this code is
> > extremely
> > > performance sensitive.  Operations like transfer should be built on top
> > of
> > > this state model.  (In that case, it would mean src moves to clear
> state
> > > and target moves to access state.  It also means that transfer should
> > only
> > > work in access state.)
> > >
> > > If we need special purpose data structures that don't operate in these
> > > ways, we should make sure to keep them separate rather than trying to
> > > accommodate a deviation from this pattern in the core vector code.
> > >
> > > I wrote xxx above because I see the purpose of zeroVectors as being a
> > reset
> > > on the vector state back to the original state.  Maybe we should
> actually
> > > call it 'reset' rather than 'zeroVectors'.  This would basically pick
> up
> > at
> > > mutate mode again.
> > >
> > > Since these rules were never formalized, I'm sure there are a few
> places
> > > where we currently deviate.  We should enforce these rules and then get
> > > those issues fixed.
> > >
> > >
> > >
> > > --
> > > Jacques Nadeau
> > > CTO and Co-Founder, Dremio
> > >
> > > On Tue, Aug 25, 2015 at 8:02 AM, Abdel Hakim Deneche <
> > > adeneche@maprtech.com>
> > > wrote:
> > >
> > > > Another important point to keep in mind here:
> > ValueVectorWriteExpression
> > > > operates under the "undocumented" assumption that the destination
> > vector
> > > is
> > > > empty, this way it can safely skip writing null values. In the case
> of
> > > > window functions I am using a value vector as an internal buffer to
> > hold
> > > > values between batches which voids the assumption.
> > > > If this assumption is indeed correct then adding zeroVector to value
> > > > vectors is indeed the way to go.
> > > >
> > > > On Mon, Aug 24, 2015 at 8:30 PM, Jacques Nadeau <jacques@dremio.com>
> > > > wrote:
> > > >
> > > > > In general, let's try to avoid extending the core structures like
> > value
> > > > > vector read and write expressions for a single operator. Zerovector
> > is
> > > > > trivial to implement so let's resolve that way (trivial since the
> > > > > underlying vector already has it and we just need to delegate
> down).
> > > > > On Aug 24, 2015 3:36 PM, "Aman Sinha" <amansinha@apache.org>
> wrote:
> > > > >
> > > > > > I am reviewing Hakim's patch for DRILL-3668 (first_value window
> > > > function
> > > > > > incorrect result).  His code uses ValueVectorWriteExpression
to
> set
> > > > > values
> > > > > > in an internal batch which get re-used across different
> partitions
> > of
> > > > the
> > > > > > window function.  Ideally, we just want to zero out the vector
> > rather
> > > > > than
> > > > > > calling clear() since clear() will release the buffer.
> > > > > >
> > > > > > However, currently zeroVectors() is only supported by
> > > FixedWidthVector,
> > > > > not
> > > > > > VariableWidthVector.  * Should there be such an interface for
> > > variable
> > > > > > width ? * The implementation could zero out just the offset
> vector.
> > > > > >
> > > > > > In the absence of such an interface, Hakim has added a boolean
> flag
> > > > > > witeNulls to ValueVectorWriteExpression (see
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/adeneche/incubator-drill/commit/cab73cd1a50163dd25fe0f9c55c264780ea3616d
> > > > > > )
> > > > > >  and is conditionally doing the null-ing out in the generated
> code.
> > > It
> > > > > > won't affect the normal code path, it would get used for specific
> > > > window
> > > > > > functions.
> > > > > >
> > > > > > I am thinking of committing his patch and tracking the
> > zeroVectors()
> > > > > > enhancement separately (if people agree that it would be useful).
> > > Let
> > > > me
> > > > > > know...
> > > > > >
> > > > > > Aman
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Abdelhakim Deneche
> > > >
> > > > Software Engineer
> > > >
> > > >   <http://www.mapr.com/>
> > > >
> > > >
> > > > Now Available - Free Hadoop On-Demand Training
> > > > <
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > >
> > > >
> > >
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message