drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jacques Nadeau (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-173) Join operator should reuse ValueVectors when duplicate keys are present
Date Thu, 29 May 2014 15:46:15 GMT

     [ https://issues.apache.org/jira/browse/DRILL-173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jacques Nadeau updated DRILL-173:
---------------------------------

    Fix Version/s: Future

> Join operator should reuse ValueVectors when duplicate keys are present
> -----------------------------------------------------------------------
>
>                 Key: DRILL-173
>                 URL: https://issues.apache.org/jira/browse/DRILL-173
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.0.0-milestone-1
>            Reporter: Ben Becker
>              Labels: optimization
>             Fix For: Future
>
>
> There are cases where joining two record batches can result in redundant work.  Consider
a merge join performed on two tables (*t1* and *t2*) with duplicate keys on both sides:
> h5. t1
> || key || value ||
> | 2 | 'a' |
> | 2 | 'b' |
> h5. t2
> || key || value ||
> | 2 | 'A' |
> | 2 | 'B' |
> | 2 | 'C' |
> The resulting table will contain the cross product of all key values '2':
> || key || t1.value || t2.value ||
> | 2 | 'a' | 'A' |
> | 2 | 'a' | 'B' |
> | 2 | 'a' | 'C' |
> | 2 | 'b' | 'A' |
> | 2 | 'b' | 'B' |
> | 2 | 'b' | 'C' |
> The current implementation iteratively copies t2.value from the incoming vectors.  Ideally,
the t2.value vector would only be iteratively constructed the first pass; after that it can
be copied.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message