hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "liyunzhang_intel (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]
Date Tue, 06 Jun 2017 08:58:18 GMT

    [ https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16038414#comment-16038414
] 

liyunzhang_intel commented on HIVE-11297:
-----------------------------------------

[~csun]:   we can not do that because GenSparkProcContext#clonedPruningTableScanSet will be
sent to topNodes of GenSparkWorkWalker#startWalking. And GenSparkWorkWalker will split tree
in min cost. So if topNode is 1, it will split following tree
{noformat}
TS[1]-FIL[17]- SEL[18] -GBY[19]-SPARKPRUNINGSINK[20]
                    -SEL[21] -GBY[22]-SPARKPRUNINGSINK[23]
{noformat}
into  only 1 tree
{noformat}
TS[1]-FIL[17]- SEL[18] -GBY[19]-SPARKPRUNINGSINK[20]
{noformat}

The log of GenSparkWork
{code}
[root@bdpe41 hive]# grep GenSparkWork logs/hive.log 
2017-06-06T16:34:12,527 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] spark.GenSparkWork:
Root operator: TS[0]
2017-06-06T16:34:12,527 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] spark.GenSparkWork:
Leaf operator: RS[2]
2017-06-06T16:34:19,070 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] spark.GenSparkWork:
First pass. Leaf operator: RS[2]
2017-06-06T16:34:19,070 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] spark.GenSparkWork:
Root operator: JOIN[5]
2017-06-06T16:34:19,070 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] spark.GenSparkWork:
Leaf operator: RS[9]
2017-06-06T16:34:22,858 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] spark.GenSparkWork:
Removing RS[2] as parent from JOIN[5]
2017-06-06T16:34:22,859 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] spark.GenSparkWork:
Removing RS[4] as parent from JOIN[5]
2017-06-06T16:34:22,859 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] spark.GenSparkWork:
First pass. Leaf operator: RS[9]
2017-06-06T16:34:22,859 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] spark.GenSparkWork:
Root operator: GBY[10]
2017-06-06T16:34:22,859 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] spark.GenSparkWork:
Leaf operator: FS[12]
2017-06-06T16:34:27,322 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] spark.GenSparkWork:
Removing RS[9] as parent from GBY[10]
2017-06-06T16:34:27,322 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] spark.GenSparkWork:
First pass. Leaf operator: FS[12]
2017-06-06T16:34:27,322 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] spark.GenSparkWork:
Root operator: TS[1]
2017-06-06T16:34:27,322 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] spark.GenSparkWork:
Leaf operator: RS[4]
2017-06-06T16:36:14,669 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] spark.GenSparkWork:
Second pass. Leaf operator: RS[4] has common downstream work:org.apache.hadoop.hive.ql.plan.ReduceWork@7e7f72
2017-06-06T16:36:14,672 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] spark.GenSparkWork:
Root operator: TS[1]
2017-06-06T16:36:14,672 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] spark.GenSparkWork:
Leaf operator: SPARKPRUNINGSINK[20]
2017-06-06T16:38:22,338 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] spark.GenSparkWork:
First pass. Leaf operator: SPARKPRUNINGSINK[20]
{code}


> Combine op trees for partition info generating tasks [Spark branch]
> -------------------------------------------------------------------
>
>                 Key: HIVE-11297
>                 URL: https://issues.apache.org/jira/browse/HIVE-11297
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: spark-branch
>            Reporter: Chao Sun
>            Assignee: liyunzhang_intel
>         Attachments: HIVE-11297.1.patch, HIVE-11297.2.patch
>
>
> Currently, for dynamic partition pruning in Spark, if a small table generates partition
info for more than one partition columns, multiple operator trees are created, which all start
from the same table scan op, but have different spark partition pruning sinks.
> As an optimization, we can combine these op trees and so don't have to do table scan
multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message