spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Baoxu Shi (JIRA)" <>
Subject [jira] [Comment Edited] (SPARK-2245) VertexRDD can not be materialized for checkpointing
Date Wed, 25 Jun 2014 01:44:24 GMT


Baoxu Shi edited comment on SPARK-2245 at 6/25/14 1:43 AM:

Hi [~ankurd], I changed my pull request. But there is another exception, ShippableVertexPartition
is not serializable. So I serialized it, but there is another exception org.apache.spark.graphx.impl.RoutingTablePartition
is not serializable.  Then I serialized it again, but on iteration 2 there will be an exception:
org.apache.spark.graphx.impl.ShippableVertexPartition cannot be cast to scala.Tuple2

The code I'm using are:

val conf = new SparkConf().setAppName("HDTM")
    val sc = new SparkContext(conf)
    val v = sc.parallelize(Seq[(VertexId, Long)]((0L, 0L), (1L, 1L), (2L, 2L)))
    val e = sc.parallelize(Seq[Edge[Long]](Edge(0L, 1L, 0L), Edge(1L, 2L, 1L), Edge(2L, 0L,
    var g = Graph(v, e)

    val vertexIds = Seq(0L, 1L, 2L)
    var prevG: Graph[VertexId, Long] = null
    for (i <- 1 to 2000) {
      vertexIds.toStream.foreach(id => {
        prevG = g
        g = Graph(g.vertices, g.edges)

        prevG.unpersistVertices(blocking = false)
        prevG.edges.unpersist(blocking = false)


      println(s"${g.vertices.isCheckpointed} ${g.edges.isCheckpointed}")

      println(" iter " + i + " finished")

    println(g.vertices.collect().mkString(" "))
    println(g.edges.collect().mkString(" "))

Am I on the right track? Or Should there be another way to change it?

was (Author: bxshi):
Just submit the changes, thanks!

> VertexRDD can not be materialized for checkpointing
> ---------------------------------------------------
>                 Key: SPARK-2245
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: GraphX
>            Reporter: Baoxu Shi
> Seems one can not materialize VertexRDD by simply calling count method, which is overridden
by VertexRDD. But if you call RDD's count, it could materialize it.
> Is this a feature that designed to get the count without materialize VertexRDD? If so,
do you guys think it is necessary to add a materialize method to VertexRDD?
> By the way, does count() is the cheapest way to materialize a RDD? Or it just cost the
same resources like other actions?
> The pull request is here:
> Best,

This message was sent by Atlassian JIRA

View raw message