spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <>
Subject [jira] [Commented] (SPARK-3427) Avoid active vertex tracking in static PageRank
Date Sun, 07 Sep 2014 01:43:28 GMT


Apache Spark commented on SPARK-3427:

User 'ankurdave' has created a pull request for this issue:

> Avoid active vertex tracking in static PageRank
> -----------------------------------------------
>                 Key: SPARK-3427
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: GraphX
>            Reporter: Ankur Dave
>            Assignee: Ankur Dave
> GraphX's current implementation of static (fixed iteration count) PageRank uses the Pregel
API. This unnecessarily tracks active vertices, even though in static PageRank all vertices
are always active. Active vertex tracking incurs the following costs:
> 1. A shuffle per iteration to ship the active sets to the edge partitions.
> 2. A hash table creation per iteration at each partition to index the active sets for
> 3. A hash lookup per edge to check whether the source vertex is active.
> I reimplemented static PageRank using the lower-level GraphX API instead of the Pregel
API. In benchmarks on a 16-node m2.4xlarge cluster, this provided a 23% speedup (from 514
s to 397 s, mean over 3 trials) for 10 iterations of PageRank on a synthetic graph with 10M
vertices and 1.27B edges.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message