drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jinfeng Ni (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (DRILL-5586) UnionAll operator does more than necessary value vector allocation and copy
Date Wed, 14 Jun 2017 00:01:38 GMT

     [ https://issues.apache.org/jira/browse/DRILL-5586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jinfeng Ni reassigned DRILL-5586:
---------------------------------

    Assignee: Jinfeng Ni

> UnionAll operator does more than necessary value vector allocation and copy
> ---------------------------------------------------------------------------
>
>                 Key: DRILL-5586
>                 URL: https://issues.apache.org/jira/browse/DRILL-5586
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Jinfeng Ni
>            Assignee: Jinfeng Ni
>
> When inputs to UnionAll operators are just simple field reference, in stead of an expression
involving a function, which requires evaluation, it should leverage value vector's transfer
API.  Doing transfer would avoid the allocation of buffer for value vector in outgoing batch,
plus the overhead to copy the data from incoming batch to outgoing batch. 
> For example, in the following query:
> {code}
> select l_orderkey from cp.`tpch/lineitem.parquet` l union all select n_nationkey from
cp.`tpch/nation.parquet`
> {code}
> Both left and right side of UnionAll operator is simple filed reference, and Drill should
call transfer API. However, the current code would do buffer allocation & copy for both
left and right. Such processing would significantly slow UnionAll operator's performance,
and eventually slow down query evaluation.
> DRILL-5521 reverts a change in logic whether applying transfer logic made in DRILL-5419,
based on SchemaPath equal comparison.  Even we fix that problem, it's not enough to use SchemaPath
equal comparison as criteria whether transfer should be used. Ideally, even the output field
and incoming field have different names, UnionAll operator should do {{transfer}}, instead
of {{copy}}, as long as the expression is simple field reference. 
> {code}
> select l_orderkey as Key1 from cp.`tpch/lineitem.parquet` l union all select n_nationkey
as Key2 from cp.`tpch/nation.parquet`
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message