drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5839) Handle Empty Batches in Merge Receiver
Date Thu, 05 Oct 2017 21:03:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16193670#comment-16193670
] 

ASF GitHub Bot commented on DRILL-5839:
---------------------------------------

Github user ppadma commented on the issue:

    https://github.com/apache/drill/pull/974
  
    @paul-rogers Thank you Paul. I made the change and pushed the new diffs. 


> Handle Empty Batches in Merge Receiver
> --------------------------------------
>
>                 Key: DRILL-5839
>                 URL: https://issues.apache.org/jira/browse/DRILL-5839
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.11.0
>            Reporter: Padma Penumarthy
>            Assignee: Padma Penumarthy
>             Fix For: 1.12.0
>
>
> merge receiver throws an exception when it receives first batch as empty batch (no rows
and no schema) from any of the senders. Problem is that the operator expects at least one
batch with schema (0 rows is ok, 0 columns is not) from each of its senders. 
> The way algorithm works is as follows:
> Get the first batch from each of the senders.
> Create hyper vector container with this first batch from each of the senders.
> Add the batches from senders to the priority queue
> Pop from priority queue, get the index for the current batch from that sender, 
> and use that to copy from the hyper vector to the outgoing vector
> When the end of batch from a sender is reached, load the next batch from the sender.
> Stop when there are no more batches from any of the senders.
> If any of the senders do not send first batch with schema and if we skip adding that
batch to the hyper vector, hyper vector is not setup correctly and all the offsets from selection
vector to individual batches from senders with in the hyper vector are messed up. 
> Fix for this problem is when we receive empty batch from any of the senders, create dummy
batch with schema  from one of the other senders and add it to the hyper vector. 
> If all senders send empty first batches, we just return NONE to downstream operator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message