drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Hyde <jh...@apache.org>
Subject Re: Can we pass the #skipped records with RecordBatch?
Date Tue, 08 Dec 2015 21:01:17 GMT
It seems that SidebandTunnel is point-to-point. That is, there is one producer and one consumer.
No broadcast or topics (multiple consumers of the same message). Order is preserved. At-most-once
(i.e. may lose data in event of failure). Producer and consumer may be on the same node or
different nodes. Correct?

I’m not sure SidebandTunnel.close is necessary. I would presume that a SidebandTunnel is
closed when its associated statement is closed, and only then.

Also, would it be easier if the tunnels were defined as part of the DAG, and DAG initialization
time was the only time that they could be created?

Julian
  

> On Dec 8, 2015, at 11:00 AM, Jacques Nadeau <jacques@dremio.com> wrote:
> 
> Please see some initial thoughts attached. Would love feedback and thoughts
> from others on how we can shape this.
> 
> https://gist.github.com/jacques-n/84b13e704e0e3829ca99
> 
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
> 
> On Thu, Dec 3, 2015 at 8:17 AM, Zelaine Fong <zfong@maprtech.com> wrote:
> 
>> Yes, it would be great to get your thoughts so we can assess the scope of
>> what's involved.
>> 
>> Thanks.
>> 
>> -- Zelaine
>> 
>> On Wed, Dec 2, 2015 at 7:29 PM, Jacques Nadeau <jacques@dremio.com> wrote:
>> 
>>> Definitely agree that we shouldn't boil the ocean.  That said, I don't
>>> think we should make RecordBatch interface changes without deliberate
>>> design. Same for RPC protocol changes. Part of my internal struggle with
>>> the warning patch is exactly this lack of broader design. I think this is
>>> especially true given the drive to supports backwards compatibility.
>>> 
>>> I don't think we're talking about a massive undertaking. I'll try to
>> write
>>> up some thoughts later this week to get the ball rolling. Sound good?
>>> 
>>> --
>>> Jacques Nadeau
>>> CTO and Co-Founder, Dremio
>>> +1 on having a framework.
>>> OTOH, as with the warnings implementation, we might want to go ahead
>> with a
>>> simpler implementation while we get a more generic framework design in
>>> place.
>>> 
>>> Jacques, do you have any preliminary thoughts on the framework?
>>> 
>>> On Tue, Dec 1, 2015 at 2:08 PM, Julian Hyde <jhyde@apache.org> wrote:
>>> 
>>>> +1 for a sideband mechanism.
>>>> 
>>>> Sideband can also allow correlated restart of sub-queries.
>>>> 
>>>> In sideband use cases you described, the messages ran in the opposite
>>>> direction to the data. Would the sideband also run in the same
>> direction
>>> as
>>>> the data? If so it could carry warnings, rejected rows, progress
>>>> indications, and (for online aggregation[1]) notifications that a
>> better
>>>> approximate query result is available.
>>>> 
>>>> Julian
>>>> 
>>>> [1] https://en.wikipedia.org/wiki/Online_aggregation
>>>> 
>>>> 
>>>> 
>>>>> On Dec 1, 2015, at 1:51 PM, Jacques Nadeau <jacques@dremio.com>
>> wrote:
>>>>> 
>>>>> This seems like a form of sideband communication. I think we should
>>> have
>>>> a
>>>>> framework for this type of thing in general rather than a one-off for
>>>> this
>>>>> particular need. Other forms of sideband might be small table
>>> bloomfilter
>>>>> generation and pushdown into hbase, separate file
>>> assignment/partitioning
>>>>> providers balancing/generating scanner workloads, statistics
>> generation
>>>> for
>>>>> adaptive execution, etc.
>>>>> 
>>>>> --
>>>>> Jacques Nadeau
>>>>> CTO and Co-Founder, Dremio
>>>>> 
>>>>> On Tue, Dec 1, 2015 at 11:35 AM, Hsuan Yi Chu <hyichu@maprtech.com>
>>>> wrote:
>>>>> 
>>>>>> I am trying to deal with the following scenario:
>>>>>> 
>>>>>> A bunch of minor fragments are doing things in parallel. Each of
>> them
>>>> could
>>>>>> skip some records. Since the downstream minor fragment needs to know
>>> the
>>>>>> sum of skipped-record-counts (in order to just display or see if
the
>>>> number
>>>>>> exceeds the threshold) in the upstreams, each upstream minor
>> fragment
>>>> needs
>>>>>> to pass this scalar with RecordBatch.
>>>>>> 
>>>>>> Since this seems impacting the protocol of RecordBatch, I am looking
>>> for
>>>>>> some advice here.
>>>>>> 
>>>>>> Thanks.
>>>>>> 
>>>> 
>>>> 
>>> 
>> 


Mime
View raw message