hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sahil Takiar (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-18368) Improve Spark Debug RDD Graph
Date Thu, 04 Jan 2018 22:06:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sahil Takiar updated HIVE-18368:
--------------------------------
    Summary: Improve Spark Debug RDD Graph  (was: Improve SparkPlan Graph)

> Improve Spark Debug RDD Graph
> -----------------------------
>
>                 Key: HIVE-18368
>                 URL: https://issues.apache.org/jira/browse/HIVE-18368
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>
> The {{SparkPlan}} class does some logging to show the mapping between different {{SparkTran}},
what shuffle types are used, and what trans are cached. However, there is room for improvement.
> When debug logging is enabled the RDD graph is logged, but there isn't much information
printed about each RDD.
> We should combine both of the graphs and improve them. We could even make the Spark Plan
graph part of the {{EXPLAIN EXTENDED}} output.
> Ideally, the final graph shows a clear relationship between Tran objects, RDDs, and BaseWorks.
Edge should include information about number of partitions, shuffle types, Spark operations
used, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message