giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arjun Sharma <>
Subject Number of workers vs number of threads
Date Mon, 13 Jul 2015 08:56:53 GMT

Many of the discussions on this forum suggest using one worker per physical
machine, and increasing the number of threads per worker, versus using
multiple workers per physical machine, with a less number of threads. This
does not seem to be the case with my experiments.

The cluster I am using has 12 physical machines (used exclusively for
workers), 64 GB of RAM and 12 cores each. I experimented with two setups:

Setup 1 runs 72 workers (i.e., 6 workers per machine), 72*72 partitions,
which is the default, and 8 threads per worker.

Setup 2 tries to simulate Setup 1, but using threads instead of workers.
Therefore, it has 12 workers (1 worker per machine), 72*72 partitions
(using numUserPartitions), and since the number of parallel tasks per
machine in Setup 1 is 6 workers * 8 threads, then the number of compute,
input, output threads is set to 48.

In both cases 56 GB of RAM is assigned equally to all workers on the
machine (either given to the 1 worker on that machine or divided among 6 of

In my case, Setup 1 performs significantly better (faster) than Setup 2,
which sounds counter intuitive, and not agreeing with other suggestions of
using less number of workers, and more number of threads. Is there anything
I am missing here? Is there any kind of tuning or configuration parameter
setting that can make Setup 2 outperform Setup 1?


View raw message