drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacques Nadeau <jacq...@dremio.com>
Subject Re: Can we pass the #skipped records with RecordBatch?
Date Tue, 08 Dec 2015 19:00:11 GMT
Please see some initial thoughts attached. Would love feedback and thoughts
from others on how we can shape this.

https://gist.github.com/jacques-n/84b13e704e0e3829ca99

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Thu, Dec 3, 2015 at 8:17 AM, Zelaine Fong <zfong@maprtech.com> wrote:

> Yes, it would be great to get your thoughts so we can assess the scope of
> what's involved.
>
> Thanks.
>
> -- Zelaine
>
> On Wed, Dec 2, 2015 at 7:29 PM, Jacques Nadeau <jacques@dremio.com> wrote:
>
> > Definitely agree that we shouldn't boil the ocean.  That said, I don't
> > think we should make RecordBatch interface changes without deliberate
> > design. Same for RPC protocol changes. Part of my internal struggle with
> > the warning patch is exactly this lack of broader design. I think this is
> > especially true given the drive to supports backwards compatibility.
> >
> > I don't think we're talking about a massive undertaking. I'll try to
> write
> > up some thoughts later this week to get the ball rolling. Sound good?
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> > +1 on having a framework.
> > OTOH, as with the warnings implementation, we might want to go ahead
> with a
> > simpler implementation while we get a more generic framework design in
> > place.
> >
> > Jacques, do you have any preliminary thoughts on the framework?
> >
> > On Tue, Dec 1, 2015 at 2:08 PM, Julian Hyde <jhyde@apache.org> wrote:
> >
> > > +1 for a sideband mechanism.
> > >
> > > Sideband can also allow correlated restart of sub-queries.
> > >
> > > In sideband use cases you described, the messages ran in the opposite
> > > direction to the data. Would the sideband also run in the same
> direction
> > as
> > > the data? If so it could carry warnings, rejected rows, progress
> > > indications, and (for online aggregation[1]) notifications that a
> better
> > > approximate query result is available.
> > >
> > > Julian
> > >
> > > [1] https://en.wikipedia.org/wiki/Online_aggregation
> > >
> > >
> > >
> > > > On Dec 1, 2015, at 1:51 PM, Jacques Nadeau <jacques@dremio.com>
> wrote:
> > > >
> > > > This seems like a form of sideband communication. I think we should
> > have
> > > a
> > > > framework for this type of thing in general rather than a one-off for
> > > this
> > > > particular need. Other forms of sideband might be small table
> > bloomfilter
> > > > generation and pushdown into hbase, separate file
> > assignment/partitioning
> > > > providers balancing/generating scanner workloads, statistics
> generation
> > > for
> > > > adaptive execution, etc.
> > > >
> > > > --
> > > > Jacques Nadeau
> > > > CTO and Co-Founder, Dremio
> > > >
> > > > On Tue, Dec 1, 2015 at 11:35 AM, Hsuan Yi Chu <hyichu@maprtech.com>
> > > wrote:
> > > >
> > > >> I am trying to deal with the following scenario:
> > > >>
> > > >> A bunch of minor fragments are doing things in parallel. Each of
> them
> > > could
> > > >> skip some records. Since the downstream minor fragment needs to know
> > the
> > > >> sum of skipped-record-counts (in order to just display or see if the
> > > number
> > > >> exceeds the threshold) in the upstreams, each upstream minor
> fragment
> > > needs
> > > >> to pass this scalar with RecordBatch.
> > > >>
> > > >> Since this seems impacting the protocol of RecordBatch, I am looking
> > for
> > > >> some advice here.
> > > >>
> > > >> Thanks.
> > > >>
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message