drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hsuan Yi Chu <hyi...@maprtech.com>
Subject Re: Can we pass the #skipped records with RecordBatch?
Date Tue, 01 Dec 2015 22:16:21 GMT
@Julian:
In the use case, the message flows in the same direction as the data. And
you are right. If we have a sideband, many additional info can be carried
and displayed to the users.

I think we have a pull request specifically dealing with WARNING:
https://github.com/abhipol/drill/commit/137059cd44ec28e8ba3bf2aa73d2c1cbcd55d604

Let me see if there is anything common which can be shared among.


On Tue, Dec 1, 2015 at 2:08 PM, Julian Hyde <jhyde@apache.org> wrote:

> +1 for a sideband mechanism.
>
> Sideband can also allow correlated restart of sub-queries.
>
> In sideband use cases you described, the messages ran in the opposite
> direction to the data. Would the sideband also run in the same direction as
> the data? If so it could carry warnings, rejected rows, progress
> indications, and (for online aggregation[1]) notifications that a better
> approximate query result is available.
>
> Julian
>
> [1] https://en.wikipedia.org/wiki/Online_aggregation
>
>
>
> > On Dec 1, 2015, at 1:51 PM, Jacques Nadeau <jacques@dremio.com> wrote:
> >
> > This seems like a form of sideband communication. I think we should have
> a
> > framework for this type of thing in general rather than a one-off for
> this
> > particular need. Other forms of sideband might be small table bloomfilter
> > generation and pushdown into hbase, separate file assignment/partitioning
> > providers balancing/generating scanner workloads, statistics generation
> for
> > adaptive execution, etc.
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Tue, Dec 1, 2015 at 11:35 AM, Hsuan Yi Chu <hyichu@maprtech.com>
> wrote:
> >
> >> I am trying to deal with the following scenario:
> >>
> >> A bunch of minor fragments are doing things in parallel. Each of them
> could
> >> skip some records. Since the downstream minor fragment needs to know the
> >> sum of skipped-record-counts (in order to just display or see if the
> number
> >> exceeds the threshold) in the upstreams, each upstream minor fragment
> needs
> >> to pass this scalar with RecordBatch.
> >>
> >> Since this seems impacting the protocol of RecordBatch, I am looking for
> >> some advice here.
> >>
> >> Thanks.
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message