Mailing-List: contact user-help@giraph.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@giraph.apache.org
Received-SPF: unknown (athena.apache.org: error in processing during lookup of
 kumbhare@usc.edu)
MIME-Version: 1.0
Date: Tue, 3 Sep 2013 12:30:36 -0700
Message-ID: 
 <CAPHwgDjhLafTOJhEVv0S_C3kVtog=Hd5_XW8V77vJwZ_sZVsYg@mail.gmail.com>
Subject: Benchmarking Giraph
From: Alok Kumbhare <kumbhare@usc.edu>
To: user@giraph.apache.org
Content-Type: multipart/alternative; boundary=047d7b1632e928e62c04e57fb91c

--047d7b1632e928e62c04e57fb91c
Content-Type: text/plain; charset=ISO-8859-1

Hi,
We are looking into some giraph benchmarks to compare against a similar
programming model and framework we are working on.

As a start we are planning to benchmark the following algorithms on data
sets with more than a billion edges.

1. Single Source Shortest Path from a given source
2. Page Rank
3. Connected Components

We have a small cluster of 16 nodes (8 core/16 gb each) to run the
benchmarks. Given that we have a few questions to help us get the best out
of giraph.

1. Which version of giraph should we use to take advantage of the
optimizations in terms of memory optimization/caching, multi-threading etc.
mentioned here
https://www.facebook.com/notes/facebook-engineering/scaling-apache-giraph-to-a-trillion-edges/10151617006153920?
1.0 or trunk?

2. Are the samples present in the giraph distribution for the above
algorithms a good place to start? How can we take advantage of different
optimizations, including aggregators/combiners for these algorithms?

3. Is there a document i can look at to understand the best practices for
implementing optimized vertex-centric code using the latest features and
deployment guidelines to maximize utilization.

Looking forward to your help.

Thanks,
Alok Kumbhare

--047d7b1632e928e62c04e57fb91c
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi,<div>We are looking into some giraph benchmarks to comp=
are against a similar programming model and framework we are working on.</d=
iv><div><br></div><div>As a start we are planning to benchmark the followin=
g algorithms on data sets with more than a billion edges.=A0</div>

<div><br></div><div>1. Single Source Shortest Path from a given source</div=
><div>2. Page Rank</div><div>3. Connected Components</div><div><br></div><d=
iv>We have a small cluster of 16 nodes (8 core/16 gb each) to run the bench=
marks. Given that we have a few questions to help us get the best out of gi=
raph.</div>

<div><br></div><div>1. Which version of giraph should we use to take advant=
age of the optimizations in terms of memory optimization/caching, multi-thr=
eading etc. mentioned here=A0<a href=3D"https://www.facebook.com/notes/face=
book-engineering/scaling-apache-giraph-to-a-trillion-edges/1015161700615392=
0" target=3D"_blank">https://www.facebook.com/notes/facebook-engineering/sc=
aling-apache-giraph-to-a-trillion-edges/10151617006153920</a>? 1.0 or trunk=
?=A0</div>
<div><br></div><div>2. Are the samples present in the giraph distribution f=
or the above algorithms a good place to start? How can we take advantage of=
 different optimizations, including aggregators/combiners for these algorit=
hms?</div>
<div><br></div><div>3. Is there a document i can look at to understand the =
best practices for implementing optimized vertex-centric code using the lat=
est features and deployment guidelines to maximize utilization.</div><div>
<br></div><div>Looking forward to your help.</div><div><br></div><div>Thank=
s,</div><div>Alok Kumbhare</div>
</div>

--047d7b1632e928e62c04e57fb91c--