drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-5514) Enhance VectorContainer to merge two row sets
Date Mon, 15 May 2017 22:13:04 GMT
Paul Rogers created DRILL-5514:

             Summary: Enhance VectorContainer to merge two row sets
                 Key: DRILL-5514
                 URL: https://issues.apache.org/jira/browse/DRILL-5514
             Project: Apache Drill
          Issue Type: Improvement
    Affects Versions: 1.10.0
            Reporter: Paul Rogers
            Assignee: Paul Rogers
            Priority: Minor
             Fix For: 1.11.0

Consider the concept of a "record batch" in Drill. On the one hand, one can envision a record
batch as a stack of records:

| a1 | b1 | c1 |
| a2 | b2 | c2 |

But, Drill is columnar. So a record batch is really a "bundle" of vectors:

| a1 |    | b1 |    | c1 |
| a2 |    | b2 |    | c2 |

There are times when it is handy to build up a record batch as a merge of two different vector

-- bundle 1 --    -- bundle 2 --
| a1 |    | b1 |        | c1 |
| a2 |    | b2 |        | c2 |

For example, consider a reader. The reader implementation might read columns (a, b) from a
file, say. Then, the "{{ScanBatch}}" might add (c) as an implicit vector (the file name, say.)
The merged set of vectors comprises the final schema: (a, b, c).

This ticket asks for the code to do the merge:

* Merge two schemas A = (a, b), B = (c) to create schema C = (a, b, c).
* Merge two vector containers C1 and C2 to create a new container, C3, that holds the merger
of the vectors from the first two.

This message was sent by Atlassian JIRA

View raw message