Hi Jon,
goldenorb@googlegroups.com (so as to not clog up their mailing list
uninvited)
First of all, thank you for sharing this comparison. I would like to
note a few things. The results I posted in October 2011 were actually a
bit old (done in June 2011) and do not have several improvements that
reduce memory usage significantly (i.e. GIRAPH12 and GIRAPH91). The
number of vertices loadable per worker is highly dependent on the number
of edges per worker, the amount of available heap memory, number of
messages, the balancing of the graph across the workers, etc. In recent
tests at Facebook, I have been able to load over 10 million vertices /
worker easily with 20 edges / vertex. I know that you wrote that the
maximum per worker was at least 1.6 million vertices for Giraph, I just
wanted to let folks know that it's in fact much higher. We'll work on
continuing to improve that in the future as today's graph problems are
in the billions of vertices or rather hundreds of billions =).
Also, with respect to scalability, if I'm interpreting these results
correctly, does it mean that GoldenOrb is currently unable to load more
than 250k vertices / cluster as observed by former Ravel developers? if
so, given the small tests and overhead per superstep, I wouldn't expect
the scalability to be much improved by more workers. Also, the max
value and shortest paths algorithms are highly data dependent to how
many messages are passed around per superstep and perhaps not a fair
scaling comparison with Giraph's scalability designed page rank
benchmark test (equal messages per superstep distributed evenly across
vertices). Would be nice to see an applestoapples comparison if
someone has the time...=)
Thanks,
Avery
On 12/10/11 3:16 PM, Jon Allen wrote:
> Since GoldenOrb was released this past summer, a number of people have asked questions
regarding scalability and performance testing, as well as a comparison of these results with
those of Giraph ( http://incubator.apache.org/giraph/ ), so I went forward with running tests
to help answer some of these questions.
>
> A full report of the scalability testing results, along with methodology details, relevant
information regarding testing and analysis, links to data points for Pregel and Giraph, scalability
testing references, and background mathematics, can be found here:
>
> http://wwwrel.ph.utexas.edu/Members/jon/golden_orb/
>
> Since this data will also be of interest to the Giraph community (for methodology, background
references, and analysis reasons), I am cross posting to the Giraph user mailing list.
>
> A synopsis of the scalability results for GoldenOrb, and comparison data points for Giraph
and Google's Pregel framework are provided below.
>
> The setup and execution of GoldenOrb scalability tests were conducted by three former
Ravel (http://www.raveldata.com ) developers, including myself, with extensive knowledge of
the GoldenOrb code base and optimal system configurations, ensuring the most optimal settings
were used for scalability testing.
>
>
> RESULTS SUMMARY:
>
>
> MAX CAPACITY:
>
> Pregel (at least): 166,666,667 vertices per node.
>
> Giraph (at least): 1,666,667 vertices per worker.
>
> GoldenOrb: ~ 100,000 vertices per node, 33,333 vertices per worker.
>
>
> STRONG SCALING (SSSP):
> Note: Optimal parallelization corresponds to the minimum value 1.0. Deviation from the
minimum possible value of 1.0 corresponds to nonoptimal parallelization.
>
> Pregel: 0.924 (1 billion total vertices)
>
> Giraph: 0.934 (250 Million total vertices)
>
> GoldenOrb: 0.031 Average, 0.631 Best (100000 total vertices), 0.020 Worst (1000 total
vertices)
>
>
> WEAK SCALING (SSSP):
> Note: Optimal weak scalability corresponds to the value 0.0. Deviation from the optimal
value of 0.0, corresponds to nonoptimal usage of computational resources as managed by the
framework.
>
> Pregel: No Data Available
>
> Giraph: 0.01 (1,666,667 vertices per worker)
>
> GoldenOrb: 0.37 Average, 0.23 Best (500 vertices per node), 0.48 Worst (12500 vertices
per node)
>
>
>
> I hope this answers some of the many questions which have been posted regarding scalability
and performance. Be sure to check out the full scalability testing report at http://wwwrel.ph.utexas.edu/Members/jon/golden_orb/
Please let me know if you have any questions.
>
> Thanks,
> Jon
