giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avery Ching <ach...@apache.org>
Subject Re: Giraph : newbie questions
Date Tue, 17 Jul 2012 18:00:59 GMT
Answers inline.

On 7/17/12 1:22 AM, Nicolas DUGUE wrote:
> Thanks for your answer David !
>
> Okay, but, is there a way to force Giraph to partition the Graph in 
> our own way and how to do that ? It may be useful to minimize 
> communication between Giraph nodes.
>
The partitioning method is very customizable.  See 
GraphPartitionerFactory as the interface you need to implement. 
HashPartitionerFactory is what we use as the default, but you can 
implement your own.

> You're talking about starting the job with a minimum of vertices and 
> add new vertices then. It seems really interesting, how to do that and 
> how does it work ?
The graph is mutable as the application is running.  See MutableVertex 
for all the local and remote mutations you can make.

> For example, I run my Giraph job with half of the vertices and during 
> my first superstep, I add (I don't know how) some vertices to my file. 
> Will these vertices be taken in account for my first superstep or just 
> for the next superstep.
> And when the vertices are loaded, is it possible to remove it from the 
> memory ? In other words, I can add new vertices, can I remove vertices 
> too ? So, is it possible to change the topology of my graph dynamically ?
>
Yes, see above.

> Moreover, I'm still wondering what is the best ? Launching one VM with 
> Giraph on each server and with 20GB of Ram OR launching two of its 
> with 10GB of RAM for each ?
>
Well, in that case, I'm guessing one server with 20 GB since there would 
be no communication (most of the effort).

> And finally, when I launch a Giraph Job, Zookeeper is loaded in one 
> virtual machine alone... Is there a way to run some Giraph jobs in 
> this virtual machine too ? Or to mention explicitely in which VM 
> running the ZooKeeper Job ?
>
ZooKeeper runs in the same slot as the master process, not sure you'd 
want to do more there as it's best to balance the memory usage across 
the workers.
> Best regards,
> Nicolas
>
> On 16/07/2012 21:51, David Garcia wrote:
>> Giraph partitions the vertices using a hashing function that's basically
>> the equivalent of (hash(vertexID) mod #ofComputeNodes).
>> You can mitigate memory issues by starting the job with a minimum of
>> vertices in your file and then add them dynamically as your job 
>> progresses
>> (assuming that your job doesn't require all of the vertices).
>>
>> -David
>>
>>
>> On 7/16/12 4:36 AM, "Nicolas DUGUE" <nicolas.dugue@univ-orleans.fr> 
>> wrote:
>>
>>> Hi everybody,
>>>
>>>      I'm new to Giraph so I have a few questions about how it works and
>>> so how to configure it to make it work as well as possible.
>>>      We have settled a cluster of 6 servers with 24 cpu, 24GB of RAM 
>>> and
>>> we want to use it to experiment with Giraph.
>>>      Currently, we've made a few runs and we have some problems with
>>> memory, it seems that we don't give enough of it to the JVM (GC
>>> overhead, OutOfMemory, ...).
>>>      Our experiments were benchmarks using the PageRank, we only 
>>> succeed
>>> in running it on a 100 millions edges graph by running two virtual
>>> machines with 8GB of Ram on each of our server.
>>>
>>>      Here are our questions :
>>>      - What is the best ? Launching one VM with Giraph on each server
>>> and with 20GB of Ram OR launching two of its with 10GB of RAM for 
>>> each ?
>>>      - Are there a way to minimize the memory used by Hadoop to give
>>> more memory to the Giraph jobs ?
>>>      - How is the graph distributed across the cluster ? Our graph may
>>> be a power-law graph with a few nodes with a very large amount of edges
>>> and a lot of nodes with a few edges. How Giraph will distribute this
>>> kind of graph ? Does it take in account the number of edges of each
>>> vertice ?
>>>
>>> Thanks in advance,
>>> Nicolas Dugué
>>> PhD student at the Univeristy of Orléans
>
>



Mime
View raw message