spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Or (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-1538) SparkUI forgets about all persisted RDD's not associated with stages
Date Sat, 19 Apr 2014 00:49:15 GMT
Andrew Or created SPARK-1538:
--------------------------------

             Summary: SparkUI forgets about all persisted RDD's not associated with stages
                 Key: SPARK-1538
                 URL: https://issues.apache.org/jira/browse/SPARK-1538
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 0.9.1
            Reporter: Andrew Or
            Priority: Blocker
             Fix For: 1.0.0


The following command creates two RDDs in one Stage:

sc.parallelize(1 to 1000, 4).persist.map(_ + 1).count

More specifically, parallelize creates one, and map creates another. If we persist only the
first one, it does not actually show up on the StorageTab of the SparkUI.

This is because StageInfo only keeps around information for the last RDD associated with the
stage, but forgets about all of its parents. The proposal here is to have StageInfo climb
the RDD dependency ladder to keep a list of all associated RDDInfos.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message