spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-1015) Visualize the DAG of RDD
Date Thu, 26 Feb 2015 11:08:04 GMT

    [ https://issues.apache.org/jira/browse/SPARK-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338241#comment-14338241
] 

Sean Owen commented on SPARK-1015:
----------------------------------

[~zjffdu] are you planning on working on this? We also have {{toDebugString}} which prints
some of this info. How would the visualization work with spark-shell? Is this just a utility
you can host outside Spark?

> Visualize the DAG of RDD 
> -------------------------
>
>                 Key: SPARK-1015
>                 URL: https://issues.apache.org/jira/browse/SPARK-1015
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>    Affects Versions: 0.9.0
>            Reporter: Jeff Zhang
>
> The DAG of RDD can help user understand the data flow and how spark get the final RDD
executed.  It could help user to find chances to optimize the execution of some complex RDD.
 I will leverage graphviz to visualize the DAG. 
> For this task, I plan to split it into 2 steps.
> Step 1.  Just visualize the simple DAG graph.  Each RDD is one node, and there will be
one edge between the parent RDD and child RDD. ( I attach one simple graph in the attachments
)
> Step 2.  Put RDD in the same stage into one sub graph. This may need to extract the splitting
staging related code in DAGSchduler. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message