drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacques Nadeau <jacq...@dremio.com>
Subject Re: zeroVectors() interface for value vectors
Date Fri, 28 Aug 2015 20:08:52 GMT
I think the zero vectors approach is fine. I don't think we should merge
the value vector read/write expression changes.
On Aug 27, 2015 9:03 PM, "Aman Sinha" <asinha@maprtech.com> wrote:

> Based on the discussion it looks like doing what the first_value/last_value
> window function needs from the value vectors without violating the
> recommended state transition requires a bit more thought such that we don't
> introduce regression. Since testing is blocked on these for 1.2, can Hakim
> proceed with his current fix ?  We could create a JIRA to revisit it post
> 1.2...
>
> Aman
>
> On Wed, Aug 26, 2015 at 3:07 PM, Julien Le Dem <julien@dremio.com> wrote:
>
> > I can take a look at the Vectors and add asserts to enforce the contract
> is
> > respected.
> >
> > On Wed, Aug 26, 2015 at 2:52 PM, Steven Phillips <steven@dremio.com>
> > wrote:
> >
> > > One possible exception to the access pattern occurs when vectors wrap
> > other
> > > vectors. Specifically, the offset vectors in Variable Length and
> Repeated
> > > vectors. These vectors are accessed and mutated multiple times. If we
> are
> > > going to implement strict enforcement, we need to consider that case.
> > >
> > > On Tue, Aug 25, 2015 at 7:15 PM, Jacques Nadeau <jacques@dremio.com>
> > > wrote:
> > >
> > > > Yes, by recommendation is to correct the usage in StreamingAggBatch
> > > >
> > > > --
> > > > Jacques Nadeau
> > > > CTO and Co-Founder, Dremio
> > > >
> > > > On Tue, Aug 25, 2015 at 4:52 PM, Abdel Hakim Deneche <
> > > > adeneche@maprtech.com>
> > > > wrote:
> > > >
> > > > > I think zeroVector() is mainly used to fill the vector with zeros,
> > > which
> > > > is
> > > > > fine if you call it while the vector is in "mutate" state, but
> > > > > StreamingAggBatch does actually call it after setting the value
> count
> > > of
> > > > > the value vector which is against the paradigm.
> > > > >
> > > > >
> > > > > On Tue, Aug 25, 2015 at 3:51 PM, Jacques Nadeau <
> jacques@dremio.com>
> > > > > wrote:
> > > > >
> > > > > > In all but one situations, this is an internal concern (making
> sure
> > > to
> > > > > zero
> > > > > > out the memory).  For fixed width vectors, there is an assumption
> > > that
> > > > an
> > > > > > initial allocation is clean memory (e.g. all zeros in the faces
> of
> > an
> > > > int
> > > > > > vector).  So this should be pulled off a public vector interface.
> > > The
> > > > > one
> > > > > > place where it is being used today is StreamingAggBatch and
I
> think
> > > we
> > > > > > should fix that to follow the state paradigm described above.
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Jacques Nadeau
> > > > > > CTO and Co-Founder, Dremio
> > > > > >
> > > > > > On Tue, Aug 25, 2015 at 3:41 PM, Abdel Hakim Deneche <
> > > > > > adeneche@maprtech.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Another question: FixedWidthVector interface defines a
> > zeroVector()
> > > > > > method
> > > > > > > that
> > > > > > > "Zero out the underlying buffer backing this vector" according
> to
> > > > it's
> > > > > > > javadoc.
> > > > > > >
> > > > > > > Where does this method fit in the value vector states described
> > > > > earlier ?
> > > > > > > it doesn't clear the vector yet it doesn't reset everything
to
> > the
> > > > > after
> > > > > > > allocate state.
> > > > > > >
> > > > > > > On Tue, Aug 25, 2015 at 10:46 AM, Abdel Hakim Deneche <
> > > > > > > adeneche@maprtech.com
> > > > > > > > wrote:
> > > > > > >
> > > > > > > > One more question about the transition from allocate
->
> mutate.
> > > For
> > > > > > Fixed
> > > > > > > > width vectors and BitVector you can actually call
setSafe()
> > > without
> > > > > > > calling
> > > > > > > > allocateNew() first and it will work. Should it throw
an
> > > exception
> > > > > > > instead
> > > > > > > > ?
> > > > > > > > not calling allocateNew() has side effects that could
cause
> > > > setSafe()
> > > > > > to
> > > > > > > > throw an OversizedAllocationException if you call
setSafe()
> > then
> > > > > > clear()
> > > > > > > > multiple times.
> > > > > > > >
> > > > > > > > On Tue, Aug 25, 2015 at 10:01 AM, Chris Westin <
> > > > > > chriswestin42@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > >> Maybe we should start by putting these rules in
a comment in
> > the
> > > > > value
> > > > > > > >> vector base interfaces? The lack of such information
is why
> > > there
> > > > > are
> > > > > > > >> deviations and other expectations.
> > > > > > > >>
> > > > > > > >> On Tue, Aug 25, 2015 at 8:22 AM, Jacques Nadeau
<
> > > > jacques@dremio.com
> > > > > >
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >> > There are a few unspoken "rules" around vectors:
> > > > > > > >> >
> > > > > > > >> > - values need to be written in order (e.g.
index 0, 1, 2,
> 5)
> > > > > > > >> > - null vectors start with all values as null
before
> writing
> > > > > anything
> > > > > > > >> > - for variable width types, the offset vector
should be
> all
> > > > zeros
> > > > > > > before
> > > > > > > >> > writing
> > > > > > > >> > - you must call setValueCount before a vector
can be read
> > > > > > > >> > - you should never write to a vector once
it has been
> read.
> > > > > > > >> >
> > > > > > > >> > The ultimate goal we should get to the point
where you the
> > > > > > interfaces
> > > > > > > >> > guarantee this order of operation:
> > > > > > > >> >
> > > > > > > >> > allocate > mutate > setvaluecount >
access > clear (or
> > > allocate
> > > > to
> > > > > > > start
> > > > > > > >> > the process over, xxx).  Any deviation from
this pattern
> > > should
> > > > > > result
> > > > > > > >> in
> > > > > > > >> > exception.  We should do this only in debug
mode as this
> > code
> > > is
> > > > > > > >> extremely
> > > > > > > >> > performance sensitive.  Operations like transfer
should be
> > > built
> > > > > on
> > > > > > > top
> > > > > > > >> of
> > > > > > > >> > this state model.  (In that case, it would
mean src moves
> to
> > > > clear
> > > > > > > state
> > > > > > > >> > and target moves to access state.  It also
means that
> > transfer
> > > > > > should
> > > > > > > >> only
> > > > > > > >> > work in access state.)
> > > > > > > >> >
> > > > > > > >> > If we need special purpose data structures
that don't
> > operate
> > > in
> > > > > > these
> > > > > > > >> > ways, we should make sure to keep them separate
rather
> than
> > > > trying
> > > > > > to
> > > > > > > >> > accommodate a deviation from this pattern
in the core
> vector
> > > > code.
> > > > > > > >> >
> > > > > > > >> > I wrote xxx above because I see the purpose
of zeroVectors
> > as
> > > > > being
> > > > > > a
> > > > > > > >> reset
> > > > > > > >> > on the vector state back to the original
state.  Maybe we
> > > should
> > > > > > > >> actually
> > > > > > > >> > call it 'reset' rather than 'zeroVectors'.
 This would
> > > basically
> > > > > > pick
> > > > > > > >> up at
> > > > > > > >> > mutate mode again.
> > > > > > > >> >
> > > > > > > >> > Since these rules were never formalized,
I'm sure there
> are
> > a
> > > > few
> > > > > > > places
> > > > > > > >> > where we currently deviate.  We should enforce
these rules
> > and
> > > > > then
> > > > > > > get
> > > > > > > >> > those issues fixed.
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > --
> > > > > > > >> > Jacques Nadeau
> > > > > > > >> > CTO and Co-Founder, Dremio
> > > > > > > >> >
> > > > > > > >> > On Tue, Aug 25, 2015 at 8:02 AM, Abdel Hakim
Deneche <
> > > > > > > >> > adeneche@maprtech.com>
> > > > > > > >> > wrote:
> > > > > > > >> >
> > > > > > > >> > > Another important point to keep in mind
here:
> > > > > > > >> ValueVectorWriteExpression
> > > > > > > >> > > operates under the "undocumented" assumption
that the
> > > > > destination
> > > > > > > >> vector
> > > > > > > >> > is
> > > > > > > >> > > empty, this way it can safely skip writing
null values.
> In
> > > the
> > > > > > case
> > > > > > > of
> > > > > > > >> > > window functions I am using a value
vector as an
> internal
> > > > buffer
> > > > > > to
> > > > > > > >> hold
> > > > > > > >> > > values between batches which voids the
assumption.
> > > > > > > >> > > If this assumption is indeed correct
then adding
> > zeroVector
> > > to
> > > > > > value
> > > > > > > >> > > vectors is indeed the way to go.
> > > > > > > >> > >
> > > > > > > >> > > On Mon, Aug 24, 2015 at 8:30 PM, Jacques
Nadeau <
> > > > > > jacques@dremio.com
> > > > > > > >
> > > > > > > >> > > wrote:
> > > > > > > >> > >
> > > > > > > >> > > > In general, let's try to avoid
extending the core
> > > structures
> > > > > > like
> > > > > > > >> value
> > > > > > > >> > > > vector read and write expressions
for a single
> operator.
> > > > > > > Zerovector
> > > > > > > >> is
> > > > > > > >> > > > trivial to implement so let's resolve
that way
> (trivial
> > > > since
> > > > > > the
> > > > > > > >> > > > underlying vector already has it
and we just need to
> > > > delegate
> > > > > > > down).
> > > > > > > >> > > > On Aug 24, 2015 3:36 PM, "Aman
Sinha" <
> > > amansinha@apache.org
> > > > >
> > > > > > > wrote:
> > > > > > > >> > > >
> > > > > > > >> > > > > I am reviewing Hakim's patch
for DRILL-3668
> > (first_value
> > > > > > window
> > > > > > > >> > > function
> > > > > > > >> > > > > incorrect result).  His code
uses
> > > > ValueVectorWriteExpression
> > > > > > to
> > > > > > > >> set
> > > > > > > >> > > > values
> > > > > > > >> > > > > in an internal batch which
get re-used across
> > different
> > > > > > > >> partitions of
> > > > > > > >> > > the
> > > > > > > >> > > > > window function.  Ideally,
we just want to zero out
> > the
> > > > > vector
> > > > > > > >> rather
> > > > > > > >> > > > than
> > > > > > > >> > > > > calling clear() since clear()
will release the
> buffer.
> > > > > > > >> > > > >
> > > > > > > >> > > > > However, currently zeroVectors()
is only supported
> by
> > > > > > > >> > FixedWidthVector,
> > > > > > > >> > > > not
> > > > > > > >> > > > > VariableWidthVector.  * Should
there be such an
> > > interface
> > > > > for
> > > > > > > >> > variable
> > > > > > > >> > > > > width ? * The implementation
could zero out just the
> > > > offset
> > > > > > > >> vector.
> > > > > > > >> > > > >
> > > > > > > >> > > > > In the absence of such an
interface, Hakim has
> added a
> > > > > boolean
> > > > > > > >> flag
> > > > > > > >> > > > > witeNulls to ValueVectorWriteExpression
(see
> > > > > > > >> > > > >
> > > > > > > >> > > > >
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/adeneche/incubator-drill/commit/cab73cd1a50163dd25fe0f9c55c264780ea3616d
> > > > > > > >> > > > > )
> > > > > > > >> > > > >  and is conditionally doing
the null-ing out in the
> > > > > generated
> > > > > > > >> code.
> > > > > > > >> > It
> > > > > > > >> > > > > won't affect the normal code
path, it would get used
> > for
> > > > > > > specific
> > > > > > > >> > > window
> > > > > > > >> > > > > functions.
> > > > > > > >> > > > >
> > > > > > > >> > > > > I am thinking of committing
his patch and tracking
> the
> > > > > > > >> zeroVectors()
> > > > > > > >> > > > > enhancement separately (if
people agree that it
> would
> > be
> > > > > > > useful).
> > > > > > > >> > Let
> > > > > > > >> > > me
> > > > > > > >> > > > > know...
> > > > > > > >> > > > >
> > > > > > > >> > > > > Aman
> > > > > > > >> > > > >
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > > --
> > > > > > > >> > >
> > > > > > > >> > > Abdelhakim Deneche
> > > > > > > >> > >
> > > > > > > >> > > Software Engineer
> > > > > > > >> > >
> > > > > > > >> > >   <http://www.mapr.com/>
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > > Now Available - Free Hadoop On-Demand
Training
> > > > > > > >> > > <
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > >
> > > > > > > > Abdelhakim Deneche
> > > > > > > >
> > > > > > > > Software Engineer
> > > > > > > >
> > > > > > > >   <http://www.mapr.com/>
> > > > > > > >
> > > > > > > >
> > > > > > > > Now Available - Free Hadoop On-Demand Training
> > > > > > > > <
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > Abdelhakim Deneche
> > > > > > >
> > > > > > > Software Engineer
> > > > > > >
> > > > > > >   <http://www.mapr.com/>
> > > > > > >
> > > > > > >
> > > > > > > Now Available - Free Hadoop On-Demand Training
> > > > > > > <
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Abdelhakim Deneche
> > > > >
> > > > > Software Engineer
> > > > >
> > > > >   <http://www.mapr.com/>
> > > > >
> > > > >
> > > > > Now Available - Free Hadoop On-Demand Training
> > > > > <
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Julien
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message