drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hsuan Yi Chu <hyi...@maprtech.com>
Subject Re: Can we pass the #skipped records with RecordBatch?
Date Wed, 02 Dec 2015 18:43:15 GMT
+1 on having a framework.

But how about the #skipped records use case, and warning one (
https://github.com/abhipol/drill/commit/137059cd44ec28e8ba3bf2aa73d2c1cbcd55d604
)

Implementing the framework at this moment sounds a good timing because it
can benefit those two use cases in one shot.



On Tue, Dec 1, 2015 at 3:52 PM, Parth Chandra <parthc@apache.org> wrote:

> +1 on having a framework.
> OTOH, as with the warnings implementation, we might want to go ahead with a
> simpler implementation while we get a more generic framework design in
> place.
>
> Jacques, do you have any preliminary thoughts on the framework?
>
> On Tue, Dec 1, 2015 at 2:08 PM, Julian Hyde <jhyde@apache.org> wrote:
>
> > +1 for a sideband mechanism.
> >
> > Sideband can also allow correlated restart of sub-queries.
> >
> > In sideband use cases you described, the messages ran in the opposite
> > direction to the data. Would the sideband also run in the same direction
> as
> > the data? If so it could carry warnings, rejected rows, progress
> > indications, and (for online aggregation[1]) notifications that a better
> > approximate query result is available.
> >
> > Julian
> >
> > [1] https://en.wikipedia.org/wiki/Online_aggregation
> >
> >
> >
> > > On Dec 1, 2015, at 1:51 PM, Jacques Nadeau <jacques@dremio.com> wrote:
> > >
> > > This seems like a form of sideband communication. I think we should
> have
> > a
> > > framework for this type of thing in general rather than a one-off for
> > this
> > > particular need. Other forms of sideband might be small table
> bloomfilter
> > > generation and pushdown into hbase, separate file
> assignment/partitioning
> > > providers balancing/generating scanner workloads, statistics generation
> > for
> > > adaptive execution, etc.
> > >
> > > --
> > > Jacques Nadeau
> > > CTO and Co-Founder, Dremio
> > >
> > > On Tue, Dec 1, 2015 at 11:35 AM, Hsuan Yi Chu <hyichu@maprtech.com>
> > wrote:
> > >
> > >> I am trying to deal with the following scenario:
> > >>
> > >> A bunch of minor fragments are doing things in parallel. Each of them
> > could
> > >> skip some records. Since the downstream minor fragment needs to know
> the
> > >> sum of skipped-record-counts (in order to just display or see if the
> > number
> > >> exceeds the threshold) in the upstreams, each upstream minor fragment
> > needs
> > >> to pass this scalar with RecordBatch.
> > >>
> > >> Since this seems impacting the protocol of RecordBatch, I am looking
> for
> > >> some advice here.
> > >>
> > >> Thanks.
> > >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message