giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Garcia <>
Subject Re: Giraph : newbie questions
Date Mon, 16 Jul 2012 19:51:50 GMT
Giraph partitions the vertices using a hashing function that's basically
the equivalent of (hash(vertexID) mod #ofComputeNodes).
You can mitigate memory issues by starting the job with a minimum of
vertices in your file and then add them dynamically as your job progresses
(assuming that your job doesn't require all of the vertices).


On 7/16/12 4:36 AM, "Nicolas DUGUE" <> wrote:

>Hi everybody,
>     I'm new to Giraph so I have a few questions about how it works and
>so how to configure it to make it work as well as possible.
>     We have settled a cluster of 6 servers with 24 cpu, 24GB of RAM and
>we want to use it to experiment with Giraph.
>     Currently, we've made a few runs and we have some problems with
>memory, it seems that we don't give enough of it to the JVM (GC
>overhead, OutOfMemory, ...).
>     Our experiments were benchmarks using the PageRank, we only succeed
>in running it on a 100 millions edges graph by running two virtual
>machines with 8GB of Ram on each of our server.
>     Here are our questions :
>     - What is the best ? Launching one VM with Giraph on each server
>and with 20GB of Ram OR launching two of its with 10GB of RAM for each ?
>     - Are there a way to minimize the memory used by Hadoop to give
>more memory to the Giraph jobs ?
>     - How is the graph distributed across the cluster ? Our graph may
>be a power-law graph with a few nodes with a very large amount of edges
>and a lot of nodes with a few edges. How Giraph will distribute this
>kind of graph ? Does it take in account the number of edges of each
>vertice ?
>Thanks in advance,
>Nicolas Dugué
>PhD student at the Univeristy of Orléans

View raw message