giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Bishop <jbishop....@gmail.com>
Subject Re: Giraph : newbie questions
Date Fri, 20 Jul 2012 18:52:02 GMT
Avery,

Is there an example of overriding the partitioner in the giraph 0.1
distribution?

Thanks,

Jon

On Tue, Jul 17, 2012 at 11:00 AM, Avery Ching <aching@apache.org> wrote:

> Answers inline.
>
>
> On 7/17/12 1:22 AM, Nicolas DUGUE wrote:
>
>> Thanks for your answer David !
>>
>> Okay, but, is there a way to force Giraph to partition the Graph in our
>> own way and how to do that ? It may be useful to minimize communication
>> between Giraph nodes.
>>
>>  The partitioning method is very customizable.  See
> GraphPartitionerFactory as the interface you need to implement.
> HashPartitionerFactory is what we use as the default, but you can implement
> your own.
>
>
>  You're talking about starting the job with a minimum of vertices and add
>> new vertices then. It seems really interesting, how to do that and how does
>> it work ?
>>
> The graph is mutable as the application is running.  See MutableVertex for
> all the local and remote mutations you can make.
>
>
>  For example, I run my Giraph job with half of the vertices and during my
>> first superstep, I add (I don't know how) some vertices to my file. Will
>> these vertices be taken in account for my first superstep or just for the
>> next superstep.
>> And when the vertices are loaded, is it possible to remove it from the
>> memory ? In other words, I can add new vertices, can I remove vertices too
>> ? So, is it possible to change the topology of my graph dynamically ?
>>
>>  Yes, see above.
>
>
>  Moreover, I'm still wondering what is the best ? Launching one VM with
>> Giraph on each server and with 20GB of Ram OR launching two of its with
>> 10GB of RAM for each ?
>>
>>  Well, in that case, I'm guessing one server with 20 GB since there would
> be no communication (most of the effort).
>
>
>  And finally, when I launch a Giraph Job, Zookeeper is loaded in one
>> virtual machine alone... Is there a way to run some Giraph jobs in this
>> virtual machine too ? Or to mention explicitely in which VM running the
>> ZooKeeper Job ?
>>
>>  ZooKeeper runs in the same slot as the master process, not sure you'd
> want to do more there as it's best to balance the memory usage across the
> workers.
>
>  Best regards,
>> Nicolas
>>
>> On 16/07/2012 21:51, David Garcia wrote:
>>
>>> Giraph partitions the vertices using a hashing function that's basically
>>> the equivalent of (hash(vertexID) mod #ofComputeNodes).
>>> You can mitigate memory issues by starting the job with a minimum of
>>> vertices in your file and then add them dynamically as your job
>>> progresses
>>> (assuming that your job doesn't require all of the vertices).
>>>
>>> -David
>>>
>>>
>>> On 7/16/12 4:36 AM, "Nicolas DUGUE" <nicolas.dugue@univ-orleans.fr**>
>>> wrote:
>>>
>>>  Hi everybody,
>>>>
>>>>      I'm new to Giraph so I have a few questions about how it works and
>>>> so how to configure it to make it work as well as possible.
>>>>      We have settled a cluster of 6 servers with 24 cpu, 24GB of RAM and
>>>> we want to use it to experiment with Giraph.
>>>>      Currently, we've made a few runs and we have some problems with
>>>> memory, it seems that we don't give enough of it to the JVM (GC
>>>> overhead, OutOfMemory, ...).
>>>>      Our experiments were benchmarks using the PageRank, we only succeed
>>>> in running it on a 100 millions edges graph by running two virtual
>>>> machines with 8GB of Ram on each of our server.
>>>>
>>>>      Here are our questions :
>>>>      - What is the best ? Launching one VM with Giraph on each server
>>>> and with 20GB of Ram OR launching two of its with 10GB of RAM for each ?
>>>>      - Are there a way to minimize the memory used by Hadoop to give
>>>> more memory to the Giraph jobs ?
>>>>      - How is the graph distributed across the cluster ? Our graph may
>>>> be a power-law graph with a few nodes with a very large amount of edges
>>>> and a lot of nodes with a few edges. How Giraph will distribute this
>>>> kind of graph ? Does it take in account the number of edges of each
>>>> vertice ?
>>>>
>>>> Thanks in advance,
>>>> Nicolas Dugué
>>>> PhD student at the Univeristy of Orléans
>>>>
>>>
>>
>>
>
>

Mime
View raw message