spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Baoxu Shi (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-2245) VertexRDD can not be materialized for checkpointing
Date Wed, 25 Jun 2014 01:44:24 GMT

    [ https://issues.apache.org/jira/browse/SPARK-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14041448#comment-14041448
] 

Baoxu Shi edited comment on SPARK-2245 at 6/25/14 1:43 AM:
-----------------------------------------------------------

Hi [~ankurd], I changed my pull request. But there is another exception, ShippableVertexPartition
is not serializable. So I serialized it, but there is another exception org.apache.spark.graphx.impl.RoutingTablePartition
is not serializable.  Then I serialized it again, but on iteration 2 there will be an exception:
org.apache.spark.graphx.impl.ShippableVertexPartition cannot be cast to scala.Tuple2

The code I'm using are:

val conf = new SparkConf().setAppName("HDTM")
      .setMaster("local[4]")
    val sc = new SparkContext(conf)
    sc.setCheckpointDir("./checkpoint")
    val v = sc.parallelize(Seq[(VertexId, Long)]((0L, 0L), (1L, 1L), (2L, 2L)))
    val e = sc.parallelize(Seq[Edge[Long]](Edge(0L, 1L, 0L), Edge(1L, 2L, 1L), Edge(2L, 0L,
2L)))
    var g = Graph(v, e)

    val vertexIds = Seq(0L, 1L, 2L)
    var prevG: Graph[VertexId, Long] = null
    for (i <- 1 to 2000) {
      vertexIds.toStream.foreach(id => {
        prevG = g
        g = Graph(g.vertices, g.edges)

        g.vertices.cache()
        g.edges.cache()
        prevG.unpersistVertices(blocking = false)
        prevG.edges.unpersist(blocking = false)
      })

      g.vertices.checkpoint()
      g.edges.checkpoint()

      g.edges.count()
      g.vertices.count()
      println(s"${g.vertices.isCheckpointed} ${g.edges.isCheckpointed}")

      println(" iter " + i + " finished")
    }

    println(g.vertices.collect().mkString(" "))
    println(g.edges.collect().mkString(" "))

Am I on the right track? Or Should there be another way to change it?


was (Author: bxshi):
Just submit the changes, thanks!

> VertexRDD can not be materialized for checkpointing
> ---------------------------------------------------
>
>                 Key: SPARK-2245
>                 URL: https://issues.apache.org/jira/browse/SPARK-2245
>             Project: Spark
>          Issue Type: Bug
>          Components: GraphX
>            Reporter: Baoxu Shi
>
> Seems one can not materialize VertexRDD by simply calling count method, which is overridden
by VertexRDD. But if you call RDD's count, it could materialize it.
> Is this a feature that designed to get the count without materialize VertexRDD? If so,
do you guys think it is necessary to add a materialize method to VertexRDD?
> By the way, does count() is the cheapest way to materialize a RDD? Or it just cost the
same resources like other actions?
> The pull request is here:
> https://github.com/apache/spark/pull/1177
> Best,



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message