spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Glenn Strycker (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-1883) spark graph.triplets does not return correct values
Date Mon, 19 May 2014 20:47:38 GMT

    [ https://issues.apache.org/jira/browse/SPARK-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002358#comment-14002358
] 

Glenn Strycker commented on SPARK-1883:
---------------------------------------

Sorry, this has been fixed -- https://issues.apache.org/jira/browse/SPARK-1188

Thanks to rxin for pointing this out on my email list question http://apache-spark-developers-list.1001551.n3.nabble.com/BUG-graph-triplets-does-not-return-proper-values-td6693.html

-----


This was an optimization that reuses a triplet object in GraphX, and when 
you do a collect directly on triplets, the same object is returned. 

It has been fixed in Spark 1.0 here: 
https://issues.apache.org/jira/browse/SPARK-1188

To work around in older version of Spark, you can add a copy step to it, 
e.g. 

graph.triplets.map(_.copy()).collect() 

> spark graph.triplets does not return correct values
> ---------------------------------------------------
>
>                 Key: SPARK-1883
>                 URL: https://issues.apache.org/jira/browse/SPARK-1883
>             Project: Spark
>          Issue Type: Bug
>            Reporter: Glenn Strycker
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> graph.triplets does not work -- it returns incorrect results 
> I have a graph with the following edges: 
> orig_graph.edges.collect 
> =  Array(Edge(1,4,1), Edge(1,5,1), Edge(1,7,1), Edge(2,5,1), Edge(2,6,1), Edge(3,5,1),
Edge(3,6,1), Edge(3,7,1), Edge(4,1,1), Edge(5,1,1), Edge(5,2,1), Edge(5,3,1), Edge(6,2,1),
Edge(6,3,1), Edge(7,1,1), Edge(7,3,1)) 
> When I run triplets.collect, I only get the last edge repeated 16 times: 
> orig_graph.triplets.collect 
> = Array(((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1),
((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1),
((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1)) 
> I've also tried writing various map steps first before calling the triplet function,
but I get the same results as above. 
> Similarly, the example on the graphx programming guide page (http://spark.apache.org/docs/0.9.0/graphx-programming-guide.html)
is incorrect. 
> val facts: RDD[String] = 
>   graph.triplets.map(triplet => 
>     triplet.srcAttr._1 + " is the " + triplet.attr + " of " + triplet.dstAttr._1) 
> does not work, but 
> val facts: RDD[String] = 
>   graph.triplets.map(triplet => 
>     triplet.srcAttr + " is the " + triplet.attr + " of " + triplet.dstAttr) 
> does work, although the results are meaningless.  For my graph example, I get the following
line repeated 16 times: 
> 1 is the 1 of 1



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message