drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacques Nadeau <jacq...@dremio.com>
Subject Re: zeroVectors() interface for value vectors
Date Wed, 26 Aug 2015 02:15:42 GMT
Yes, by recommendation is to correct the usage in StreamingAggBatch

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Tue, Aug 25, 2015 at 4:52 PM, Abdel Hakim Deneche <adeneche@maprtech.com>
wrote:

> I think zeroVector() is mainly used to fill the vector with zeros, which is
> fine if you call it while the vector is in "mutate" state, but
> StreamingAggBatch does actually call it after setting the value count of
> the value vector which is against the paradigm.
>
>
> On Tue, Aug 25, 2015 at 3:51 PM, Jacques Nadeau <jacques@dremio.com>
> wrote:
>
> > In all but one situations, this is an internal concern (making sure to
> zero
> > out the memory).  For fixed width vectors, there is an assumption that an
> > initial allocation is clean memory (e.g. all zeros in the faces of an int
> > vector).  So this should be pulled off a public vector interface.  The
> one
> > place where it is being used today is StreamingAggBatch and I think we
> > should fix that to follow the state paradigm described above.
> >
> >
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Tue, Aug 25, 2015 at 3:41 PM, Abdel Hakim Deneche <
> > adeneche@maprtech.com>
> > wrote:
> >
> > > Another question: FixedWidthVector interface defines a zeroVector()
> > method
> > > that
> > > "Zero out the underlying buffer backing this vector" according to it's
> > > javadoc.
> > >
> > > Where does this method fit in the value vector states described
> earlier ?
> > > it doesn't clear the vector yet it doesn't reset everything to the
> after
> > > allocate state.
> > >
> > > On Tue, Aug 25, 2015 at 10:46 AM, Abdel Hakim Deneche <
> > > adeneche@maprtech.com
> > > > wrote:
> > >
> > > > One more question about the transition from allocate -> mutate. For
> > Fixed
> > > > width vectors and BitVector you can actually call setSafe() without
> > > calling
> > > > allocateNew() first and it will work. Should it throw an exception
> > > instead
> > > > ?
> > > > not calling allocateNew() has side effects that could cause setSafe()
> > to
> > > > throw an OversizedAllocationException if you call setSafe() then
> > clear()
> > > > multiple times.
> > > >
> > > > On Tue, Aug 25, 2015 at 10:01 AM, Chris Westin <
> > chriswestin42@gmail.com>
> > > > wrote:
> > > >
> > > >> Maybe we should start by putting these rules in a comment in the
> value
> > > >> vector base interfaces? The lack of such information is why there
> are
> > > >> deviations and other expectations.
> > > >>
> > > >> On Tue, Aug 25, 2015 at 8:22 AM, Jacques Nadeau <jacques@dremio.com
> >
> > > >> wrote:
> > > >>
> > > >> > There are a few unspoken "rules" around vectors:
> > > >> >
> > > >> > - values need to be written in order (e.g. index 0, 1, 2, 5)
> > > >> > - null vectors start with all values as null before writing
> anything
> > > >> > - for variable width types, the offset vector should be all zeros
> > > before
> > > >> > writing
> > > >> > - you must call setValueCount before a vector can be read
> > > >> > - you should never write to a vector once it has been read.
> > > >> >
> > > >> > The ultimate goal we should get to the point where you the
> > interfaces
> > > >> > guarantee this order of operation:
> > > >> >
> > > >> > allocate > mutate > setvaluecount > access > clear
(or allocate to
> > > start
> > > >> > the process over, xxx).  Any deviation from this pattern should
> > result
> > > >> in
> > > >> > exception.  We should do this only in debug mode as this code
is
> > > >> extremely
> > > >> > performance sensitive.  Operations like transfer should be built
> on
> > > top
> > > >> of
> > > >> > this state model.  (In that case, it would mean src moves to
clear
> > > state
> > > >> > and target moves to access state.  It also means that transfer
> > should
> > > >> only
> > > >> > work in access state.)
> > > >> >
> > > >> > If we need special purpose data structures that don't operate
in
> > these
> > > >> > ways, we should make sure to keep them separate rather than trying
> > to
> > > >> > accommodate a deviation from this pattern in the core vector
code.
> > > >> >
> > > >> > I wrote xxx above because I see the purpose of zeroVectors as
> being
> > a
> > > >> reset
> > > >> > on the vector state back to the original state.  Maybe we should
> > > >> actually
> > > >> > call it 'reset' rather than 'zeroVectors'.  This would basically
> > pick
> > > >> up at
> > > >> > mutate mode again.
> > > >> >
> > > >> > Since these rules were never formalized, I'm sure there are a
few
> > > places
> > > >> > where we currently deviate.  We should enforce these rules and
> then
> > > get
> > > >> > those issues fixed.
> > > >> >
> > > >> >
> > > >> >
> > > >> > --
> > > >> > Jacques Nadeau
> > > >> > CTO and Co-Founder, Dremio
> > > >> >
> > > >> > On Tue, Aug 25, 2015 at 8:02 AM, Abdel Hakim Deneche <
> > > >> > adeneche@maprtech.com>
> > > >> > wrote:
> > > >> >
> > > >> > > Another important point to keep in mind here:
> > > >> ValueVectorWriteExpression
> > > >> > > operates under the "undocumented" assumption that the
> destination
> > > >> vector
> > > >> > is
> > > >> > > empty, this way it can safely skip writing null values.
In the
> > case
> > > of
> > > >> > > window functions I am using a value vector as an internal
buffer
> > to
> > > >> hold
> > > >> > > values between batches which voids the assumption.
> > > >> > > If this assumption is indeed correct then adding zeroVector
to
> > value
> > > >> > > vectors is indeed the way to go.
> > > >> > >
> > > >> > > On Mon, Aug 24, 2015 at 8:30 PM, Jacques Nadeau <
> > jacques@dremio.com
> > > >
> > > >> > > wrote:
> > > >> > >
> > > >> > > > In general, let's try to avoid extending the core structures
> > like
> > > >> value
> > > >> > > > vector read and write expressions for a single operator.
> > > Zerovector
> > > >> is
> > > >> > > > trivial to implement so let's resolve that way (trivial
since
> > the
> > > >> > > > underlying vector already has it and we just need to
delegate
> > > down).
> > > >> > > > On Aug 24, 2015 3:36 PM, "Aman Sinha" <amansinha@apache.org>
> > > wrote:
> > > >> > > >
> > > >> > > > > I am reviewing Hakim's patch for DRILL-3668 (first_value
> > window
> > > >> > > function
> > > >> > > > > incorrect result).  His code uses ValueVectorWriteExpression
> > to
> > > >> set
> > > >> > > > values
> > > >> > > > > in an internal batch which get re-used across
different
> > > >> partitions of
> > > >> > > the
> > > >> > > > > window function.  Ideally, we just want to zero
out the
> vector
> > > >> rather
> > > >> > > > than
> > > >> > > > > calling clear() since clear() will release the
buffer.
> > > >> > > > >
> > > >> > > > > However, currently zeroVectors() is only supported
by
> > > >> > FixedWidthVector,
> > > >> > > > not
> > > >> > > > > VariableWidthVector.  * Should there be such an
interface
> for
> > > >> > variable
> > > >> > > > > width ? * The implementation could zero out just
the offset
> > > >> vector.
> > > >> > > > >
> > > >> > > > > In the absence of such an interface, Hakim has
added a
> boolean
> > > >> flag
> > > >> > > > > witeNulls to ValueVectorWriteExpression (see
> > > >> > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/adeneche/incubator-drill/commit/cab73cd1a50163dd25fe0f9c55c264780ea3616d
> > > >> > > > > )
> > > >> > > > >  and is conditionally doing the null-ing out in
the
> generated
> > > >> code.
> > > >> > It
> > > >> > > > > won't affect the normal code path, it would get
used for
> > > specific
> > > >> > > window
> > > >> > > > > functions.
> > > >> > > > >
> > > >> > > > > I am thinking of committing his patch and tracking
the
> > > >> zeroVectors()
> > > >> > > > > enhancement separately (if people agree that it
would be
> > > useful).
> > > >> > Let
> > > >> > > me
> > > >> > > > > know...
> > > >> > > > >
> > > >> > > > > Aman
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > --
> > > >> > >
> > > >> > > Abdelhakim Deneche
> > > >> > >
> > > >> > > Software Engineer
> > > >> > >
> > > >> > >   <http://www.mapr.com/>
> > > >> > >
> > > >> > >
> > > >> > > Now Available - Free Hadoop On-Demand Training
> > > >> > > <
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Abdelhakim Deneche
> > > >
> > > > Software Engineer
> > > >
> > > >   <http://www.mapr.com/>
> > > >
> > > >
> > > > Now Available - Free Hadoop On-Demand Training
> > > > <
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Abdelhakim Deneche
> > >
> > > Software Engineer
> > >
> > >   <http://www.mapr.com/>
> > >
> > >
> > > Now Available - Free Hadoop On-Demand Training
> > > <
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > >
> > >
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message