hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chao Sun (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]
Date Tue, 20 Jun 2017 22:23:02 GMT

    [ https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056577#comment-16056577
] 

Chao Sun commented on HIVE-11297:
---------------------------------

Sorry for the late response. Will put comments in the RB.
Regarding the filterOp issue, It's a little strange since I'm seeing something different on
my side (with the latest master branch). 
For the query you posted above, I saw:
{code}
TS[3] -> FIL[18] -> SEL[5] -> SEL[19] -> GBY[20] -> SPARKPRUNINGSINK[21]
TS[3] -> FIL[18] -> SEL[5] -> SEL[22] -> GBY[23] -> SPARKPRUNINGSINK[24]
TS[3] -> FIL[18] -> SEL[5] -> RS[7] -> JOIN[8] -> ...
{code}
inside {{SplitOpTreeForDPP}}.



> Combine op trees for partition info generating tasks [Spark branch]
> -------------------------------------------------------------------
>
>                 Key: HIVE-11297
>                 URL: https://issues.apache.org/jira/browse/HIVE-11297
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: spark-branch
>            Reporter: Chao Sun
>            Assignee: liyunzhang_intel
>         Attachments: HIVE-11297.1.patch, HIVE-11297.2.patch, HIVE-11297.3.patch, HIVE-11297.4.patch,
HIVE-11297.5.patch, HIVE-11297.6.patch
>
>
> Currently, for dynamic partition pruning in Spark, if a small table generates partition
info for more than one partition columns, multiple operator trees are created, which all start
from the same table scan op, but have different spark partition pruning sinks.
> As an optimization, we can combine these op trees and so don't have to do table scan
multiple times.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message