Thanks for the detail on your experiments. I certainly agree that it
would be very useful to make some sort of scalability/performance
testing framework to evaluate improvements. Definitely would appreciate
your help in putting one together. We have a few benchmarks
(PageRankBenchmark and RandomMessageBenchmark), but would appreciate any
help you would like to provide.
Otherwise, if that doesn't interest you, please have a look at the open
JIRAs
https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=project+%3D+GIRAPH+AND+resolution+%3D+Unresolved+AND+assignee+is+EMPTY+ORDER+BY+priority+DESC&mode=hide
and see what's interesting for you. If nothing there interests you,
feel freel to discuss here or on giraphdev or open up a JIRA. =)
Avery
On 12/11/11 1:23 PM, Jon Allen wrote:
> Hi Avery,
>
> Thanks for the response. I reached out to the graph user mailing list because I am quite
interested in helping develop / execute standardized scalability testing for Giraph, so I'm
glad to see that there is interest!
>
> Here's some follow up to some of the points you raised / questions you asked:
>
> Currently, the biggest limitation faced by GoldenOrb is the capacity issue; it can't
handle more than roughly 100,000 vertices per node. This low maximum vertices per node limitation,
coupled with instability issues, obviously hampered the ability to conduct ideal scalability
testing, but even with graphs totaling 100,000 to 250,000 vertices a clear power law slope
can be found before hitting an inevitable communication bottleneck. This can be seen by noting
that the loglog slopes of the 20k, 50k, and 100k graphs (for SSSP) remain fairly constant,
and negative, as the number of nodes in the cluster grows, unlike the slopes for 5k, 2k, and
1k graphs which demonstrate a framework overhead bottleneck, corresponding to the point where
the slope changes from negative to roughly 0 or positive (which appears to happen at around
1k vertices per node).
>
> On to the second issue you brought upâ€¦
>
> Graph problems can be notoriously difficult to implement scalability testing for precisely
for the reasons you brought up. A few things were done to allow an applestoapples comparison
with the Pregel results. First, the single source shortest path algorithm used for testing
comes directly from the Pregel paper. Second, just as in the Pregel tests, binary tree graphs
were used to ensure that each vertex had the same fixed, low order, outdegree. Last, the tests
were repeated using nonbinary tree graphs (generated by a python script) with a nonconstant,
but low order, average outdegree per vertex (average 10 edges per vertex, then again with
graphs averaging 90 edges per vertex), the results of which were seen to be quite close to
the binary tree graph data.
>
> As mentioned in passing, the scalability test results allow for a direct comparison with
the Pregel results, but should also allow for a meaningful comparison to your scalability
results for Giraph precisely because the edges per vertex have been fixed. While this is not
ideal (I would prefer a standardized set of tests which everybody runs in standardized configurations),
the proposition that the results can be meaningfully compared is backed up by two points;
First, the loglog slope of the data you presented is right in line with the value reported
by Pregel for their SSSP tests, both of which are realistic values (and show very good parallelization!),
meaning that both algorithms display similar properties for configurations in the regime not
dominated by a framework overhead bottleneck. And second, the GoldenOrb SSSP results being
compared are also from configurations which have reached a steady power law slope over the
range of nodes considered, for runs using the same algorithm as the Pregel results. These
two points, I feel, justify the comparisons made (though, again, it would be better to have
a standardized set of configurations for testing to facilitate comparing results, even between
algorithms). Since all three sets of scalability tests yield fairly linear complexity plots
(execution time vs. number of vertices in the graph, slide 29 of your talk), it also makes
sense to compare weak scaling results, a proposition supported by the consistency of the observed
GoldenOrb weak scaling results for SSSP across multiple test configurations.
>
>
> As for the results found in your October 2011 talk, they are impressive and clearly demonstrate
an ability to effectively scale to large graph problems (shown by the weak scaling slope of
~ 0.01) and to maximize the benefit of throwing additional computational resources at a known
problem (shown by the strong scaling slope of ~ 0.93), so I'm interested to see the results
of the improvements that have been made. I'm a big proponent of routine scalability testing
using a fixed set of configurations as part of the software testing process, as the comparable
results help to quantify "improvement" as the software is developed further and can often
help to identify unintended side effects of changes / find optimal configurations for various
regimes of problems, and would like to see Giraph succeed, so let me know if there's any open
issues which I might be able to dig into (I'm on the dev mailing list as well, though haven't
posted there).
>
> Thanks,
> Jon
>
>
> On Dec 11, 2011, at 1:02 PM, Avery Ching wrote:
>
>> Hi Jon,
>>
>> goldenorb@googlegroups.com (so as to not clog up their mailing list uninvited)
>>
>> First of all, thank you for sharing this comparison. I would like to note a few
things. The results I posted in October 2011 were actually a bit old (done in June 2011)
and do not have several improvements that reduce memory usage significantly (i.e. GIRAPH12
and GIRAPH91). The number of vertices loadable per worker is highly dependent on the number
of edges per worker, the amount of available heap memory, number of messages, the balancing
of the graph across the workers, etc. In recent tests at Facebook, I have been able to load
over 10 million vertices / worker easily with 20 edges / vertex. I know that you wrote that
the maximum per worker was at least 1.6 million vertices for Giraph, I just wanted to let
folks know that it's in fact much higher. We'll work on continuing to improve that in the
future as today's graph problems are in the billions of vertices or rather hundreds of billions
=).
>>
>> Also, with respect to scalability, if I'm interpreting these results correctly, does
it mean that GoldenOrb is currently unable to load more than 250k vertices / cluster as observed
by former Ravel developers? if so, given the small tests and overhead per superstep, I wouldn't
expect the scalability to be much improved by more workers. Also, the max value and shortest
paths algorithms are highly data dependent to how many messages are passed around per superstep
and perhaps not a fair scaling comparison with Giraph's scalability designed page rank benchmark
test (equal messages per superstep distributed evenly across vertices). Would be nice to
see an applestoapples comparison if someone has the time...=)
>>
>> Thanks,
>>
>> Avery
>>
>> On 12/10/11 3:16 PM, Jon Allen wrote:
>>> Since GoldenOrb was released this past summer, a number of people have asked
questions regarding scalability and performance testing, as well as a comparison of these
results with those of Giraph ( http://incubator.apache.org/giraph/ ), so I went forward with
running tests to help answer some of these questions.
>>>
>>> A full report of the scalability testing results, along with methodology details,
relevant information regarding testing and analysis, links to data points for Pregel and Giraph,
scalability testing references, and background mathematics, can be found here:
>>>
>>> http://wwwrel.ph.utexas.edu/Members/jon/golden_orb/
>>>
>>> Since this data will also be of interest to the Giraph community (for methodology,
background references, and analysis reasons), I am cross posting to the Giraph user mailing
list.
>>>
>>> A synopsis of the scalability results for GoldenOrb, and comparison data points
for Giraph and Google's Pregel framework are provided below.
>>>
>>> The setup and execution of GoldenOrb scalability tests were conducted by three
former Ravel (http://www.raveldata.com ) developers, including myself, with extensive knowledge
of the GoldenOrb code base and optimal system configurations, ensuring the most optimal settings
were used for scalability testing.
>>>
>>>
>>> RESULTS SUMMARY:
>>>
>>>
>>> MAX CAPACITY:
>>>
>>> Pregel (at least): 166,666,667 vertices per node.
>>>
>>> Giraph (at least): 1,666,667 vertices per worker.
>>>
>>> GoldenOrb: ~ 100,000 vertices per node, 33,333 vertices per worker.
>>>
>>>
>>> STRONG SCALING (SSSP):
>>> Note: Optimal parallelization corresponds to the minimum value 1.0. Deviation
from the minimum possible value of 1.0 corresponds to nonoptimal parallelization.
>>>
>>> Pregel: 0.924 (1 billion total vertices)
>>>
>>> Giraph: 0.934 (250 Million total vertices)
>>>
>>> GoldenOrb: 0.031 Average, 0.631 Best (100000 total vertices), 0.020 Worst (1000
total vertices)
>>>
>>>
>>> WEAK SCALING (SSSP):
>>> Note: Optimal weak scalability corresponds to the value 0.0. Deviation from the
optimal value of 0.0, corresponds to nonoptimal usage of computational resources as managed
by the framework.
>>>
>>> Pregel: No Data Available
>>>
>>> Giraph: 0.01 (1,666,667 vertices per worker)
>>>
>>> GoldenOrb: 0.37 Average, 0.23 Best (500 vertices per node), 0.48 Worst (12500
vertices per node)
>>>
>>>
>>>
>>> I hope this answers some of the many questions which have been posted regarding
scalability and performance. Be sure to check out the full scalability testing report at http://wwwrel.ph.utexas.edu/Members/jon/golden_orb/
Please let me know if you have any questions.
>>>
>>> Thanks,
>>> Jon
