giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ing. Alessio Arleo" <ingar...@icloud.com>
Subject Re: Giraph Partitioning
Date Tue, 24 Mar 2015 10:27:44 GMT
Hello everybody.

Almost a month later, I bump this topic because actually there’s still no clear answer about
the fate of the PartitionContext class, introduced in Giraph-504 and included in Giraph-1.0.0.
It seems like that this feature was not ported into the new version (1.1.0). Even if I strongly
believe that the new Giraph design fulfils PartitionContext purpose so that it’s unnecessary,
I do not have any evidence to support that. 

Does anybody have a clue?

~~~~~~~~~~~~~~~~~~~

Ing. Alessio Arleo

Dottorando in Ingegneria Industriale e dell’Informazione

Dottore Magistrale in Ingegneria Informatica e dell’Automazione
Dottore in Ingegneria Informatica ed Elettronica

Linkedin: it.linkedin.com/in/IngArleo <http://it.linkedin.com/in/IngArleo>
Skype: Ing. Alessio Arleo

Tel: +39 075 5853920
Cell: +39 349 0575782

~~~~~~~~~~~~~~~~~~~



> On 25 Feb 2015, at 19:56, Arjun Sharma <as469613@gmail.com> wrote:
> 
> Thanks Matthew for your replies! They are quite helpful. Regarding question number 4,
I see a commit of PartitionContext here by Maja http://mail-archives.apache.org/mod_mbox/giraph-commits/201302.mbox/%3C20130209001122.DDAD73ACE5@tyr.zones.apache.org%3E
<http://mail-archives.apache.org/mod_mbox/giraph-commits/201302.mbox/%3C20130209001122.DDAD73ACE5@tyr.zones.apache.org%3E>,
but it seems to be removed from the current version?
> 
> 
> On Wed, Feb 25, 2015 at 3:30 AM, Matthew Saltz <saltzm@gmail.com <mailto:saltzm@gmail.com>>
wrote:
> Hi,
> 
> 1) The partitions are processed in parallel based on the number of threads you specify.
The vertices within a partition are processed sequentially. You may want to use more partitions
than threads, that way if one partition takes a particularly long time to be processed, the
other threads can continue processing the remaining partitions. If you have four machines
with 12 threads each for example, with one worker per machine, the default number of partitions
will be 4^2 = 16 partitions, whereas you actually have 48 threads available, so you'd probably
want to specify the number of partitions manually to a larger number to take advantage of
parallelism. 
> 2) Yes 
> 3) If you are only doing single threading, there's no reason to do multiple partitions
per worker
> 3 (the second one)) I'm not familiar with the out-of-core functionality
> 4) I'm not sure
> 
> I'm basing this on the version of Giraph from this summer, not the most recent release,
but I don't think this part has changed. May want to verify by looking at the code.  
> 
> Best,
> Matthew
> 
> On Wed, Feb 25, 2015 at 3:25 AM, Arjun Sharma <as469613@gmail.com <mailto:as469613@gmail.com>>
wrote:
> Hi,
> 
> I understand that by default, the number of partitions = number of workers ^ 2. So, if
we have N workers, each worker will process N partitions. I have a number of questions:
> 
> 1- By default, does Giraph process the N partitions within a single worker sequentially?
If yes, when setting the parameter giraph.numComputeThreads, will partitions within each thread
be computed sequentially?
> 
> 2- By default, does Giraph keep all partitions in memory?
> 
> 3- If the answers to 1 and 2 are yes and yes, is there any advantage from using multiple
partitions versus a single partition in the case of single threading per worker?
> 
> 3- How does the out-of-core partitions affect out-of-core messages? Are they completely
independent? For example, if the number of partitions to be kept in memory is set to a number
less than N, and at the same time all messages are set to be kept in memory, will ALL messages
be kept in memory, even those from out-of-core partitions? If the situation is reversed, where
all partitions are kept in memory, and out-of-core messaging is set, will messages from memory-based
partitions be saved on disk?
> 
> 4- Is there a class like a PartitionContext, where you can access preSuperstep and postSuperstep
*per partition*, along the lines of WorkerContext?
> 
> 
> 


Mime
View raw message