giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hassan Eslami <>
Subject Re: Running Giraph 1.1 on Hadoop 2.7.2
Date Fri, 04 Mar 2016 19:46:35 GMT

1) AFAIK, the load balancing mechanism is not implemented in Giraph.
Although, the mechanism for partition migration is implemented. You may
want to use that mechanism to implement your own load-balancer insider the
framework. You can take a look at BspServiceWorker#exchangeVertexPartitions
for this purpose.

2) i. Look at PartitionUtils#computePartitionCount. Generally, if you have
n machines, the number of partitions would be n*n (each worker will get n
partitions). You can set the total number of partitions by flag
-Dgiraph.userPartitionCount (for instance, you can say
-Dgiraph.userPartitionCount=100, to have 100 partitions in total).
ii. Number of partitions are generally remain constant throughout the
computation. It is computed once in the beginning of the computation, and
will be the same for the rest of the computation.
iii. There are statistics (such as how many vertices each partition has,
how much time it took to process each partition, etc. For instance you can
look at PartitionStats class) which are mostly used for logging.


On Thu, Mar 3, 2016 at 1:46 PM, Anirudh Perugu <> wrote:

> Hi,
> I am a giraph newbie & have read how giraph works but I have a couple of
> questions.
> 1. If a machine has too much work to do, is it possible to migrate work to
> another machine for faster computation? (or is this handled by partitions
> from the master)
> (Plz view the diagram below)
> 2. i. How are the number of partitions decided?
> ii. What kind of Statistics are stored, how do they help the master to
> choose the number of partitions for the next superstep?
> iii. These statistics are in memory (because they cannot be to the disk),
> am I correct?
> [image: Inline image 2]
> Thanks,
> Anirudh

View raw message