hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chinna Rao Lalam (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-8858) Visualize generated Spark plan [Spark Branch]
Date Mon, 30 Mar 2015 14:29:53 GMT

     [ https://issues.apache.org/jira/browse/HIVE-8858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chinna Rao Lalam updated HIVE-8858:
-----------------------------------
    Attachment: HIVE-8858-spark.patch

Hi [~xuefuz],

I have uploaded the draft patch. Here i have added a simple logic by rever tracking of reducTrans.
Here is the output.

{quote}
FROM (select 'tst1' as key, cast(count(1) as string) as value from src s1
                         UNION  ALL  
      select s2.key as key, s2.value as value from src s2) unionsrc
INSERT OVERWRITE TABLE DEST1 SELECT unionsrc.key, COUNT(DISTINCT SUBSTR(unionsrc.value,5))
GROUP BY unionsrc.key
INSERT OVERWRITE TABLE DEST2 SELECT unionsrc.key, unionsrc.value, COUNT(DISTINCT SUBSTR(unionsrc.value,5))
GROUP BY unionsrc.key, unionsrc.value
{quote}

spark.SparkPlan (SparkPlan.java:logSparkPlan(95)) - ------------------------------ Spark Plan
-----------------------------
spark.SparkPlan (SparkPlan.java:logSparkPlan(101)) - 	Reduce <-- ( Shuffle (cache off)
 )  <-- ( MapTran,Reduce )  <-- ( MapInput (cache off)  ) 
spark.SparkPlan (SparkPlan.java:logSparkPlan(101)) - 	Reduce <-- ( Shuffle (cache on) 
)  <-- ( MapTran )  <-- ( MapInput (cache off)  ) 
spark.SparkPlan (SparkPlan.java:logSparkPlan(101)) - 	Reduce <-- ( Shuffle (cache off)
 )  <-- ( MapTran,Reduce )  <-- ( MapInput (cache off)  ) 
spark.SparkPlan (SparkPlan.java:logSparkPlan(101)) - 	Reduce <-- ( Shuffle (cache on) 
)  <-- ( MapTran )  <-- ( MapInput (cache off)  ) 
spark.SparkPlan (SparkPlan.java:logSparkPlan(104)) - ------------------------------ Spark
Plan -----------------------------

{quote}
select * from	
(      
  select a.key, a.val as val1, b.val as val2 from T1 a join T2 b on a.key = b.key
    union all 	
  select a.key, a.val as val1, b.val as val2 from T1 a join T2 b on a.key = b.key
) subq1
ORDER BY key, val1, val2;
{quote}

spark.SparkPlan (SparkPlan.java:logSparkPlan(95)) - ------------------------------ Spark Plan
-----------------------------
spark.SparkPlan (SparkPlan.java:logSparkPlan(101)) - 	Reduce <-- ( Shuffle (cache off)
 )  <-- ( Reduce,Reduce,Reduce,Reduce ) 
spark.SparkPlan (SparkPlan.java:logSparkPlan(101)) - 	Reduce <-- ( Shuffle (cache off)
 )  <-- ( MapTran,MapTran )  <-- ( MapInput (cache off)  ) 
spark.SparkPlan (SparkPlan.java:logSparkPlan(101)) - 	Reduce <-- ( Shuffle (cache off)
 )  <-- ( MapTran,MapTran )  <-- ( MapInput (cache off)  ) 
spark.SparkPlan (SparkPlan.java:logSparkPlan(101)) - 	Reduce <-- ( Shuffle (cache off)
 )  <-- ( MapTran,MapTran )  <-- ( MapInput (cache off)  ) 
spark.SparkPlan (SparkPlan.java:logSparkPlan(101)) - 	Reduce <-- ( Shuffle (cache off)
 )  <-- ( MapTran,MapTran )  <-- ( MapInput (cache off)  ) 
spark.SparkPlan (SparkPlan.java:logSparkPlan(104)) - ------------------------------ Spark
Plan -----------------------------

> Visualize generated Spark plan [Spark Branch]
> ---------------------------------------------
>
>                 Key: HIVE-8858
>                 URL: https://issues.apache.org/jira/browse/HIVE-8858
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Chinna Rao Lalam
>         Attachments: HIVE-8858-spark.patch
>
>
> The spark plan generated by SparkPlanGenerator contains info which isn't available in
Hive's explain plan, such as RDD caching. Also, the graph is slight different from orignal
SparkWork. Thus, it would be nice to visualize the plan as is done for SparkWork.
> Preferrably, the visualization can happen as part of Hive explain extended. If not feasible,
we at least can log this at info level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message