giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandros Daglis <>
Subject Re: What a "worker" really is and other interesting runtime information
Date Wed, 28 Nov 2012 10:13:06 GMT
Thank you Avery, that helped a lot!


On 27 November 2012 20:57, Avery Ching <> wrote:

> Hi Alexandros,
> The extra task is for the master process (a coordination task). In your
> case, since you are using a single machine, you can use a single task.
> -Dgiraph.SplitMasterWorker=**false
> and you can try multithreading instead of multiple workers.
> -Dgiraph.numComputeThreads=12
> The reason why cpu usage increases is due to netty threads to handle
> network requests.  By using multithreading instead, you should bypass this.
> Avery
> On 11/27/12 9:40 AM, Alexandros Daglis wrote:
>> Hello everybody,
>> I went through most of the documentation I could find for Giraph and also
>> most of the messages in this email list, but still I have not figured out
>> precisely what a "worker" really is. I would really appreciate it if you
>> could help me understand how the framework works.
>> At first I thought that a worker has a one-to-one correspondence to a map
>> task. Apparently this is not exactly the case, since I have noticed that if
>> I ask for x workers, the job finishes after having used x+1 map tasks. What
>> is this extra task for?
>> I have been trying out the example SSSP application on a single node with
>> 12 cores. Giving an input graph of ~400MB and using 1 worker, around 10 GBs
>> of memory are used during execution. What intrigues me is that if I use 2
>> workers for the same input (and without limiting memory per map task),
>> double the memory will be used. Furthermore, there will be no improvement
>> in performance. I rather notice a slowdown. Are these observations normal?
>> Might it be the case that 1 and 2 workers are very few and I should go to
>> the 30-100 range that is the proposed number of mappers for a conventional
>> MapReduce job?
>> Finally, a last observation. Even though I use only 1 worker, I see that
>> there are significant periods during execution where up to 90% of the 12
>> cores computing power is consumed, that is, almost 10 cores are used in
>> parallel. Does each worker spawn multiple threads and dynamically balances
>> the load to utilize the available hardware?
>> Thanks a lot in advance!
>> Best,
>> Alexandros

View raw message