hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang (JIRA)" <>
Subject [jira] [Commented] (HIVE-8858) Visualize generated Spark plan [Spark Branch]
Date Mon, 30 Mar 2015 20:31:53 GMT


Xuefu Zhang commented on HIVE-8858:

Hi Chinna, thanks for working on this. I haven't checked your patch, but the output looks
nice. I have a few suggestions:

1. we need numbering in the Trans. Otherwise, it's hard to visualize the graph.
2. Other information, such as num of partitions in ShuffleTran, is also important to show.
3. It would be better if we log this graph in one line. The easiest way is to have a toString()
method in SparkPlan and then we can just log the string representation of SparkPlan.
4. To avoid long lines, we can show the graph in the same way as we show work graph. For instance
MapTran 1 <- MapInput 1 (cache off)
Shuffle1 (cache on) <- MapTran 1
Reduce 1 <- Shuffle1 (cache on)
Reduce 2 <- Shuffle1 (cache on)
Please note that this may not represent a valid plan.

[~jxiang]/[~csun], please feel free to share your thoughts.

> Visualize generated Spark plan [Spark Branch]
> ---------------------------------------------
>                 Key: HIVE-8858
>                 URL:
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Chinna Rao Lalam
>         Attachments: HIVE-8858-spark.patch
> The spark plan generated by SparkPlanGenerator contains info which isn't available in
Hive's explain plan, such as RDD caching. Also, the graph is slight different from orignal
SparkWork. Thus, it would be nice to visualize the plan as is done for SparkWork.
> Preferrably, the visualization can happen as part of Hive explain extended. If not feasible,
we at least can log this at info level.

This message was sent by Atlassian JIRA

View raw message