giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arjun Sharma <as469...@gmail.com>
Subject Re: Giraph Partitioning
Date Wed, 25 Feb 2015 18:56:21 GMT
Thanks Matthew for your replies! They are quite helpful. Regarding question
number 4, I see a commit of PartitionContext here by Maja
http://mail-archives.apache.org/mod_mbox/giraph-commits/201302.mbox/%3C20130209001122.DDAD73ACE5@tyr.zones.apache.org%3E,
but it seems to be removed from the current version?


On Wed, Feb 25, 2015 at 3:30 AM, Matthew Saltz <saltzm@gmail.com> wrote:

> Hi,
>
> 1) The partitions are processed in parallel based on the number of threads
> you specify. The vertices within a partition are processed sequentially.
> You may want to use more partitions than threads, that way if one partition
> takes a particularly long time to be processed, the other threads can
> continue processing the remaining partitions. If you have four machines
> with 12 threads each for example, with one worker per machine, the default
> number of partitions will be 4^2 = 16 partitions, whereas you actually have
> 48 threads available, so you'd probably want to specify the number of
> partitions manually to a larger number to take advantage of parallelism.
> 2) Yes
> 3) If you are only doing single threading, there's no reason to do
> multiple partitions per worker
> 3 (the second one)) I'm not familiar with the out-of-core functionality
> 4) I'm not sure
>
> I'm basing this on the version of Giraph from this summer, not the most
> recent release, but I don't think this part has changed. May want to verify
> by looking at the code.
>
> Best,
> Matthew
>
> On Wed, Feb 25, 2015 at 3:25 AM, Arjun Sharma <as469613@gmail.com> wrote:
>
>> Hi,
>>
>> I understand that by default, the number of partitions = number of
>> workers ^ 2. So, if we have N workers, each worker will process N
>> partitions. I have a number of questions:
>>
>> 1- By default, does Giraph process the N partitions within a single
>> worker sequentially? If yes, when setting the parameter
>> giraph.numComputeThreads, will partitions within each thread be computed
>> sequentially?
>>
>> 2- By default, does Giraph keep all partitions in memory?
>>
>> 3- If the answers to 1 and 2 are yes and yes, is there any advantage from
>> using multiple partitions versus a single partition in the case of single
>> threading per worker?
>>
>> 3- How does the out-of-core partitions affect out-of-core messages? Are
>> they completely independent? For example, if the number of partitions to be
>> kept in memory is set to a number less than N, and at the same time all
>> messages are set to be kept in memory, will ALL messages be kept in memory,
>> even those from out-of-core partitions? If the situation is reversed, where
>> all partitions are kept in memory, and out-of-core messaging is set, will
>> messages from memory-based partitions be saved on disk?
>>
>> 4- Is there a class like a PartitionContext, where you can access
>> preSuperstep and postSuperstep *per partition*, along the lines of
>> WorkerContext?
>>
>>
>

Mime
View raw message