spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Darabos (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-1188) GraphX triplets not working properly
Date Wed, 23 Apr 2014 09:30:17 GMT

    [ https://issues.apache.org/jira/browse/SPARK-1188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13978027#comment-13978027
] 

Daniel Darabos commented on SPARK-1188:
---------------------------------------

The changes are in the master branch now. I can't figure out how to close a JIRA ticket :).

> GraphX triplets not working properly
> ------------------------------------
>
>                 Key: SPARK-1188
>                 URL: https://issues.apache.org/jira/browse/SPARK-1188
>             Project: Spark
>          Issue Type: Bug
>          Components: GraphX
>    Affects Versions: 0.9.0
>            Reporter: Kev Alan
>
> I followed the GraphX tutorial at http://ampcamp.berkeley.edu/big-data-mini-course/graph-analytics-with-graphx.html

> on a local stand-alone cluster (Spark version 0.9.0) with two workers. Somehow, the graph.triplets
is not returning what it should -- only Eds and Frans.
> ```
> scala> graph.edges.toArray
> 14/03/04 16:15:57 INFO SparkContext: Starting job: collect at EdgeRDD.scala:51
> 14/03/04 16:15:57 INFO DAGScheduler: Got job 5 (collect at EdgeRDD.scala:51) with 1 output
partitions (allowLocal=false)
> 14/03/04 16:15:57 INFO DAGScheduler: Final stage: Stage 27 (collect at EdgeRDD.scala:51)
> 14/03/04 16:15:57 INFO DAGScheduler: Parents of final stage: List()
> 14/03/04 16:15:57 INFO DAGScheduler: Missing parents: List()
> 14/03/04 16:15:57 INFO DAGScheduler: Submitting Stage 27 (MappedRDD[36] at map at EdgeRDD.scala:51),
which has no missing parents
> 14/03/04 16:15:57 INFO DAGScheduler: Submitting 1 missing tasks from Stage 27 (MappedRDD[36]
at map at EdgeRDD.scala:51)
> 14/03/04 16:15:57 INFO TaskSchedulerImpl: Adding task set 27.0 with 1 tasks
> 14/03/04 16:15:57 INFO TaskSetManager: Starting task 27.0:0 as TID 11 on executor localhost:
localhost (PROCESS_LOCAL)
> 14/03/04 16:15:57 INFO TaskSetManager: Serialized task 27.0:0 as 2068 bytes in 1 ms
> 14/03/04 16:15:57 INFO Executor: Running task ID 11
> 14/03/04 16:15:57 INFO BlockManager: Found block rdd_2_0 locally
> 14/03/04 16:15:57 INFO Executor: Serialized size of result for 11 is 936
> 14/03/04 16:15:57 INFO Executor: Sending result for 11 directly to driver
> 14/03/04 16:15:57 INFO Executor: Finished task ID 11
> 14/03/04 16:15:57 INFO TaskSetManager: Finished TID 11 in 13 ms on localhost (progress:
0/1)
> 14/03/04 16:15:57 INFO DAGScheduler: Completed ResultTask(27, 0)
> 14/03/04 16:15:57 INFO TaskSchedulerImpl: Remove TaskSet 27.0 from pool
> 14/03/04 16:15:57 INFO DAGScheduler: Stage 27 (collect at EdgeRDD.scala:51) finished
in 0.015 s
> 14/03/04 16:15:57 INFO SparkContext: Job finished: collect at EdgeRDD.scala:51, took
0.023602266 s
> res7: Array[org.apache.spark.graphx.Edge[Int]] = Array(Edge(2,1,7), Edge(2,4,2), Edge(3,2,4),
Edge(3,6,3), Edge(4,1,1), Edge(5,2,2), Edge(5,3,8), Edge(5,6,3))
> scala> graph.vertices.toArray
> 14/03/04 16:16:18 INFO SparkContext: Starting job: toArray at <console>:27
> 14/03/04 16:16:18 INFO DAGScheduler: Got job 6 (toArray at <console>:27) with 1
output partitions (allowLocal=false)
> 14/03/04 16:16:18 INFO DAGScheduler: Final stage: Stage 28 (toArray at <console>:27)
> 14/03/04 16:16:18 INFO DAGScheduler: Parents of final stage: List(Stage 32, Stage 29)
> 14/03/04 16:16:18 INFO DAGScheduler: Missing parents: List()
> 14/03/04 16:16:18 INFO DAGScheduler: Submitting Stage 28 (VertexRDD[15] at RDD at VertexRDD.scala:52),
which has no missing parents
> 14/03/04 16:16:18 INFO DAGScheduler: Submitting 1 missing tasks from Stage 28 (VertexRDD[15]
at RDD at VertexRDD.scala:52)
> 14/03/04 16:16:18 INFO TaskSchedulerImpl: Adding task set 28.0 with 1 tasks
> 14/03/04 16:16:18 INFO TaskSetManager: Starting task 28.0:0 as TID 12 on executor localhost:
localhost (PROCESS_LOCAL)
> 14/03/04 16:16:18 INFO TaskSetManager: Serialized task 28.0:0 as 2426 bytes in 0 ms
> 14/03/04 16:16:18 INFO Executor: Running task ID 12
> 14/03/04 16:16:18 INFO BlockManager: Found block rdd_14_0 locally
> 14/03/04 16:16:18 INFO Executor: Serialized size of result for 12 is 947
> 14/03/04 16:16:18 INFO Executor: Sending result for 12 directly to driver
> 14/03/04 16:16:18 INFO Executor: Finished task ID 12
> 14/03/04 16:16:18 INFO TaskSetManager: Finished TID 12 in 13 ms on localhost (progress:
0/1)
> 14/03/04 16:16:18 INFO DAGScheduler: Completed ResultTask(28, 0)
> 14/03/04 16:16:18 INFO TaskSchedulerImpl: Remove TaskSet 28.0 from pool
> 14/03/04 16:16:18 INFO DAGScheduler: Stage 28 (toArray at <console>:27) finished
in 0.015 s
> 14/03/04 16:16:18 INFO SparkContext: Job finished: toArray at <console>:27, took
0.027839851 s
> res9: Array[(org.apache.spark.graphx.VertexId, (String, Int))] = Array((4,(David,42)),
(2,(Bob,27)), (6,(Fran,50)), (5,(Ed,55)), (3,(Charlie,65)), (1,(Alice,28)))
> scala> graph.triplets.toArray
> 14/03/04 16:16:30 INFO SparkContext: Starting job: toArray at <console>:27
> 14/03/04 16:16:30 INFO DAGScheduler: Got job 7 (toArray at <console>:27) with 1
output partitions (allowLocal=false)
> 14/03/04 16:16:31 INFO DAGScheduler: Final stage: Stage 33 (toArray at <console>:27)
> 14/03/04 16:16:31 INFO DAGScheduler: Parents of final stage: List(Stage 34)
> 14/03/04 16:16:31 INFO DAGScheduler: Missing parents: List()
> 14/03/04 16:16:31 INFO DAGScheduler: Submitting Stage 33 (ZippedPartitionsRDD2[32] at
zipPartitions at GraphImpl.scala:60), which has no missing parents
> 14/03/04 16:16:31 INFO DAGScheduler: Submitting 1 missing tasks from Stage 33 (ZippedPartitionsRDD2[32]
at zipPartitions at GraphImpl.scala:60)
> 14/03/04 16:16:31 INFO TaskSchedulerImpl: Adding task set 33.0 with 1 tasks
> 14/03/04 16:16:31 INFO TaskSetManager: Starting task 33.0:0 as TID 13 on executor localhost:
localhost (PROCESS_LOCAL)
> 14/03/04 16:16:31 INFO TaskSetManager: Serialized task 33.0:0 as 3322 bytes in 1 ms
> 14/03/04 16:16:31 INFO Executor: Running task ID 13
> 14/03/04 16:16:31 INFO BlockManager: Found block rdd_2_0 locally
> 14/03/04 16:16:31 INFO BlockManager: Found block rdd_31_0 locally
> 14/03/04 16:16:31 INFO Executor: Serialized size of result for 13 is 931
> 14/03/04 16:16:31 INFO Executor: Sending result for 13 directly to driver
> 14/03/04 16:16:31 INFO Executor: Finished task ID 13
> 14/03/04 16:16:31 INFO TaskSetManager: Finished TID 13 in 17 ms on localhost (progress:
0/1)
> 14/03/04 16:16:31 INFO DAGScheduler: Completed ResultTask(33, 0)
> 14/03/04 16:16:31 INFO TaskSchedulerImpl: Remove TaskSet 33.0 from pool
> 14/03/04 16:16:31 INFO DAGScheduler: Stage 33 (toArray at <console>:27) finished
in 0.019 s
> 14/03/04 16:16:31 INFO SparkContext: Job finished: toArray at <console>:27, took
0.037909394 s
> res10: Array[org.apache.spark.graphx.EdgeTriplet[(String, Int),Int]] = Array(((5,(Ed,55)),(6,(Fran,50)),3),
((5,(Ed,55)),(6,(Fran,50)),3), ((5,(Ed,55)),(6,(Fran,50)),3), ((5,(Ed,55)),(6,(Fran,50)),3),
((5,(Ed,55)),(6,(Fran,50)),3), ((5,(Ed,55)),(6,(Fran,50)),3), ((5,(Ed,55)),(6,(Fran,50)),3),
((5,(Ed,55)),(6,(Fran,50)),3))
> ```



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message