giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Metzger <metrob...@gmail.com>
Subject OutOfMemoryError with large graphs
Date Sun, 11 Aug 2013 18:20:28 GMT
Hello Giraph Mailing List,

I'm a student at TU Berlin. For a project that is led by Sebastian Schelter
(Giraph Commiter), I'm (together with another student) implementing
algorithms to efficiently calculate the closeness of nodes in a graph. We
implemented a Flajolet Martin-Sketch as described in "HADI: Fast Diameter
Estimation and Mining in Massive Graphs with Hadoop" (Kang et.al.) and the
HyperLogLog sketch for space efficient closeness computations in graphs.

We were able to run our implementations on small- and mid-sized graphs. The
largest graph we tested with has 177,147 nodes and 1,977,149,596 edges (its
a kronecker graph, generated using http://www.cs.cmu.edu/~ukang/dataset/).

We also wanted to run our implementations against this graph:
http://law.di.unimi.it/webdata/twitter-2010/ which has a size of 12.5 GB
when converted into ASCII. But I'm getting OutOfMemoryError Exceptions when
using this graph. The exception is thrown from the input format. This
indicates that the system is not able to fully load the graph into memory.
I'm running it on a a 26 node cluster with 208 Map tasks, each TaskTracker
has a heap of 2 GB, hence we have a total heapspace of 416 GB.

I tried to use the Out-Of-Core execution feature of Giraph, because it seem
to enable disk-splling if the system runs out of memory. I enabled it using
the argument "-ca giraph.useOutOfCoreGraph=true" for the GiraphRunner. (Is
this the correct way to enable the feature?)

What can I do to get Giraph running with the twitter-graph?


Regards,
Robert

Mime
View raw message