Mailing-List: contact giraph-user-help@incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: giraph-user@incubator.apache.org
Received-SPF: pass (athena.apache.org: domain of claudio.martella@gmail.com
 designates 209.85.210.175 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <4EE653B0.4010301@apache.org>
References: <B459E720-AA9B-4F12-8DCB-4CF5F50FE541@gmail.com>
 <4EE4FE57.4030201@apache.org>
 <B0F441A7-F67B-4D66-BEB0-FB565622E140@gmail.com>
 <4EE653B0.4010301@apache.org>
From: Claudio Martella <claudio.martella@gmail.com>
Date: Mon, 12 Dec 2011 21:44:53 +0100
Message-ID: 
 <CAFJOoJcX-dYVompp0NXkd=3VY+8dnJhbrNkNK9y9k4kNmv25ng@mail.gmail.com>
Subject: Re: Scalability results for GoldenOrb and comparison with Giraph
To: giraph-user@incubator.apache.org
Cc: Jon Allen <ramanujon@gmail.com>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

This is all very interesting. As I wrote a few weeks ago also on
golden orb's ML, i thought about discussing a nice benchmarking
toolset at the graph devroom of FOSDEM with hama, goldenorb and giraph
devs.

Apparently everything got quite anticipated, cool :)

I believe the SSSP and PageRank algorithms are great examples for
benchmarking as they have a completely different messaging pattern.
There are though other "technicalities" to test, such as the
scalability of graph mutation operations, graph load etc.

Jon, thanks for your nice contribution from my side as well.

On Mon, Dec 12, 2011 at 8:19 PM, Avery Ching <aching@apache.org> wrote:
> Thanks for the detail on your experiments. =A0I certainly agree that it w=
ould
> be very useful to make some sort of scalability/performance testing
> framework to evaluate improvements. =A0Definitely would appreciate your h=
elp
> in putting one together. =A0We have a few benchmarks (PageRankBenchmark a=
nd
> RandomMessageBenchmark), but would appreciate any help you would like to
> provide.
>
> Otherwise, if that doesn't interest you, please have a look at the open
> JIRAs
>
> https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=3Dtrue&jq=
lQuery=3Dproject+%3D+GIRAPH+AND+resolution+%3D+Unresolved+AND+assignee+is+E=
MPTY+ORDER+BY+priority+DESC&mode=3Dhide
>
> and see what's interesting for you. =A0If nothing there interests you, fe=
el
> freel to discuss here or on giraph-dev or open up a JIRA. =A0=3D)
>
> Avery
>
>
> On 12/11/11 1:23 PM, Jon Allen wrote:
>>
>> Hi Avery,
>>
>> Thanks for the response. I reached out to the graph user mailing list
>> because I am quite interested in helping develop / execute standardized
>> scalability testing for Giraph, so I'm glad to see that there is interes=
t!
>>
>> Here's some follow up to some of the points you raised / questions you
>> asked:
>>
>> Currently, the biggest limitation faced by GoldenOrb is the capacity
>> issue; it can't handle more than roughly 100,000 vertices per node. This=
 low
>> maximum vertices per node limitation, coupled with instability issues,
>> obviously hampered the ability to conduct ideal scalability testing, but
>> even with graphs totaling 100,000 to 250,000 vertices a clear power law
>> slope can be found before hitting an inevitable communication bottleneck=
.
>> This can be seen by noting that the log-log slopes of the 20k, 50k, and =
100k
>> graphs (for SSSP) remain fairly constant, and negative, as the number of
>> nodes in the cluster grows, unlike the slopes for 5k, 2k, and 1k graphs
>> which demonstrate a framework overhead bottleneck, corresponding to the
>> point where the slope changes from negative to roughly 0 or positive (wh=
ich
>> appears to happen at around 1k vertices per node).
>>
>> On to the second issue you brought up=85
>>
>> Graph problems can be notoriously difficult to implement scalability
>> testing for precisely for the reasons you brought up. A few things were =
done
>> to allow an apples-to-apples comparison with the Pregel results. First, =
the
>> single source shortest path algorithm used for testing comes directly fr=
om
>> the Pregel paper. Second, just as in the Pregel tests, binary tree graph=
s
>> were used to ensure that each vertex had the same fixed, low order,
>> outdegree. Last, the tests were repeated using non-binary tree graphs
>> (generated by a python script) with a non-constant, but low order, avera=
ge
>> outdegree per vertex (average 10 edges per vertex, then again with graph=
s
>> averaging 90 edges per vertex), the results of which were seen to be qui=
te
>> close to the binary tree graph data.
>>
>> As mentioned in passing, the scalability test results allow for a direct
>> comparison with the Pregel results, but should also allow for a meaningf=
ul
>> comparison to your scalability results for Giraph precisely because the
>> edges per vertex have been fixed. While this is not ideal (I would prefe=
r a
>> standardized set of tests which everybody runs in standardized
>> configurations), the proposition that the results can be meaningfully
>> compared is backed up by two points; First, the log-log slope of the dat=
a
>> you presented is right in line with the value reported by Pregel for the=
ir
>> SSSP tests, both of which are realistic values (and show very good
>> parallelization!), meaning that both algorithms display similar properti=
es
>> for configurations in the regime not dominated by a framework overhead
>> bottleneck. And second, the GoldenOrb SSSP results being compared are al=
so
>> from configurations which have reached a steady power law slope over the
>> range of nodes considered, for runs using the same algorithm as the Preg=
el
>> results. These two points, I feel, justify the comparisons made (though,
>> again, it would be better to have a standardized set of configurations f=
or
>> testing to facilitate comparing results, even between algorithms). Since=
 all
>> three sets of scalability tests yield fairly linear complexity plots
>> (execution time vs. number of vertices in the graph, slide 29 of your ta=
lk),
>> it also makes sense to compare weak scaling results, a proposition suppo=
rted
>> by the consistency of the observed GoldenOrb weak scaling results for SS=
SP
>> across multiple test configurations.
>>
>>
>> As for the results found in your October 2011 talk, they are impressive
>> and clearly demonstrate an ability to effectively scale to large graph
>> problems (shown by the weak scaling slope of ~ 0.01) and to maximize the
>> benefit of throwing additional computational resources at a known proble=
m
>> (shown by the strong scaling slope of ~ -0.93), so I'm interested to see=
 the
>> results of the improvements that have been made. I'm a big proponent of
>> routine scalability testing using a fixed set of configurations as part =
of
>> the software testing process, as the comparable results help to quantify
>> "improvement" as the software is developed further and can often help to
>> identify unintended side effects of changes / find optimal configuration=
s
>> for various regimes of problems, and would like to see Giraph succeed, s=
o
>> let me know if there's any open issues which I might be able to dig into
>> (I'm on the dev mailing list as well, though haven't posted there).
>>
>> Thanks,
>> Jon
>>
>>
>> On Dec 11, 2011, at 1:02 PM, Avery Ching wrote:
>>
>>> Hi Jon,
>>>
>>> -goldenorb@googlegroups.com (so as to not clog up their mailing list
>>> uninvited)
>>>
>>> First of all, thank you for sharing this comparison. =A0I would like to
>>> note a few things. =A0The results I posted in October 2011 were actuall=
y a bit
>>> old (done in June 2011) and do not have several improvements that reduc=
e
>>> memory usage significantly (i.e. GIRAPH-12 and GIRAPH-91). =A0The numbe=
r of
>>> vertices loadable per worker is highly dependent on the number of edges=
 per
>>> worker, the amount of available heap memory, number of messages, the
>>> balancing of the graph across the workers, etc. =A0In recent tests at
>>> Facebook, I have been able to load over 10 million vertices / worker ea=
sily
>>> with 20 edges / vertex. =A0I know that you wrote that the maximum per w=
orker
>>> was at least 1.6 million vertices for Giraph, I just wanted to let folk=
s
>>> know that it's in fact much higher. =A0We'll work on continuing to impr=
ove
>>> that in the future as today's graph problems are in the billions of ver=
tices
>>> or rather hundreds of billions =3D).
>>>
>>> Also, with respect to scalability, if I'm interpreting these results
>>> correctly, does it mean that GoldenOrb is currently unable to load more=
 than
>>> 250k vertices / cluster as observed by former Ravel developers? =A0if s=
o,
>>> given the small tests and overhead per superstep, I wouldn't expect the
>>> scalability to be much improved by more workers. =A0Also, the max value=
 and
>>> shortest paths algorithms are highly data dependent to how many message=
s are
>>> passed around per superstep and perhaps not a fair scaling comparison w=
ith
>>> Giraph's scalability designed page rank benchmark test (equal messages =
per
>>> superstep distributed evenly across vertices). =A0Would be nice to see =
an
>>> apples-to-apples comparison if someone has the time...=3D)
>>>
>>> Thanks,
>>>
>>> Avery
>>>
>>> On 12/10/11 3:16 PM, Jon Allen wrote:
>>>>
>>>> Since GoldenOrb was released this past summer, a number of people have
>>>> asked questions regarding scalability and performance testing, as well=
 as a
>>>> comparison of these results with those of Giraph (
>>>> http://incubator.apache.org/giraph/ ), so I went forward with running =
tests
>>>> to help answer some of these questions.
>>>>
>>>> A full report of the scalability testing results, along with methodolo=
gy
>>>> details, relevant information regarding testing and analysis, links to=
 data
>>>> points for Pregel and Giraph, scalability testing references, and back=
ground
>>>> mathematics, can be found here:
>>>>
>>>> http://wwwrel.ph.utexas.edu/Members/jon/golden_orb/
>>>>
>>>> Since this data will also be of interest to the Giraph community (for
>>>> methodology, background references, and analysis reasons), I am cross
>>>> posting to the Giraph user mailing list.
>>>>
>>>> A synopsis of the scalability results for GoldenOrb, and comparison da=
ta
>>>> points for Giraph and Google's Pregel framework are provided below.
>>>>
>>>> The setup and execution of GoldenOrb scalability tests were conducted =
by
>>>> three former Ravel (http://www.raveldata.com ) developers, including m=
yself,
>>>> with extensive knowledge of the GoldenOrb code base and optimal system
>>>> configurations, ensuring the most optimal settings were used for scala=
bility
>>>> testing.
>>>>
>>>>
>>>> RESULTS SUMMARY:
>>>>
>>>>
>>>> MAX CAPACITY:
>>>>
>>>> Pregel (at least): 166,666,667 vertices per node.
>>>>
>>>> Giraph (at least): 1,666,667 vertices per worker.
>>>>
>>>> GoldenOrb: ~ 100,000 vertices per node, 33,333 vertices per worker.
>>>>
>>>>
>>>> STRONG SCALING (SSSP):
>>>> Note: Optimal parallelization corresponds to the minimum value -1.0.
>>>> Deviation from the minimum possible value of -1.0 corresponds to non-o=
ptimal
>>>> parallelization.
>>>>
>>>> Pregel: -0.924 (1 billion total vertices)
>>>>
>>>> Giraph: -0.934 (250 Million total vertices)
>>>>
>>>> GoldenOrb: -0.031 Average, -0.631 Best (100000 total vertices), 0.020
>>>> Worst (1000 total vertices)
>>>>
>>>>
>>>> WEAK SCALING (SSSP):
>>>> Note: Optimal weak scalability corresponds to the value 0.0. Deviation
>>>> from the optimal value of 0.0, corresponds to non-optimal usage of
>>>> computational resources as managed by the framework.
>>>>
>>>> Pregel: No Data Available
>>>>
>>>> Giraph: 0.01 (1,666,667 vertices per worker)
>>>>
>>>> GoldenOrb: 0.37 Average, 0.23 Best (500 vertices per node), 0.48 Worst
>>>> (12500 vertices per node)
>>>>
>>>>
>>>>
>>>> I hope this answers some of the many questions which have been posted
>>>> regarding scalability and performance. Be sure to check out the full
>>>> scalability testing report at
>>>> http://wwwrel.ph.utexas.edu/Members/jon/golden_orb/ =A0Please let me k=
now if
>>>> you have any questions.
>>>>
>>>> Thanks,
>>>> Jon
>
>
>


--=20
=A0 =A0Claudio Martella
=A0 =A0claudio.martella@gmail.com