hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sahil Takiar (JIRA)" <>
Subject [jira] [Created] (HIVE-18368) Improve SparkPlan Graph
Date Thu, 04 Jan 2018 01:25:00 GMT
Sahil Takiar created HIVE-18368:

             Summary: Improve SparkPlan Graph
                 Key: HIVE-18368
             Project: Hive
          Issue Type: Sub-task
          Components: Spark
            Reporter: Sahil Takiar
            Assignee: Sahil Takiar

The {{SparkPlan}} class does some logging to show the mapping between different {{SparkTran}}s,
what shuffle types are used, and what trans are cached. However, there is room for improvement.

When debug logging is enabled the RDD graph is logged, but there isn't much information printed
about each RDD.

We should combine both of the graphs and improve them. We could even make the Spark Plan graph
part of the {{EXPLAIN EXTENDED}} output.

Ideally, the final graph shows a clear relationship between Tran objects, RDDs, and BaseWorks.
Edge should include information about number of partitions, shuffle types, Spark operations
used, etc.

This message was sent by Atlassian JIRA

View raw message