giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas DUGUE <nicolas.du...@univ-orleans.fr>
Subject Giraph : newbie questions
Date Mon, 16 Jul 2012 09:36:33 GMT
Hi everybody,

     I'm new to Giraph so I have a few questions about how it works and 
so how to configure it to make it work as well as possible.
     We have settled a cluster of 6 servers with 24 cpu, 24GB of RAM and 
we want to use it to experiment with Giraph.
     Currently, we've made a few runs and we have some problems with 
memory, it seems that we don't give enough of it to the JVM (GC 
overhead, OutOfMemory, ...).
     Our experiments were benchmarks using the PageRank, we only succeed 
in running it on a 100 millions edges graph by running two virtual 
machines with 8GB of Ram on each of our server.

     Here are our questions :
     - What is the best ? Launching one VM with Giraph on each server 
and with 20GB of Ram OR launching two of its with 10GB of RAM for each ?
     - Are there a way to minimize the memory used by Hadoop to give 
more memory to the Giraph jobs ?
     - How is the graph distributed across the cluster ? Our graph may 
be a power-law graph with a few nodes with a very large amount of edges 
and a lot of nodes with a few edges. How Giraph will distribute this 
kind of graph ? Does it take in account the number of edges of each 
vertice ?

Thanks in advance,
Nicolas Dugué
PhD student at the Univeristy of Orléans

Mime
View raw message