giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avery Ching <ach...@apache.org>
Subject Re: What a "worker" really is and other interesting runtime information
Date Tue, 27 Nov 2012 19:57:58 GMT
Hi Alexandros,

The extra task is for the master process (a coordination task). In your 
case, since you are using a single machine, you can use a single task.

-Dgiraph.SplitMasterWorker=false

and you can try multithreading instead of multiple workers.

-Dgiraph.numComputeThreads=12

The reason why cpu usage increases is due to netty threads to handle 
network requests.  By using multithreading instead, you should bypass this.

Avery

On 11/27/12 9:40 AM, Alexandros Daglis wrote:
> Hello everybody,
>
> I went through most of the documentation I could find for Giraph and 
> also most of the messages in this email list, but still I have not 
> figured out precisely what a "worker" really is. I would really 
> appreciate it if you could help me understand how the framework works.
>
> At first I thought that a worker has a one-to-one correspondence to a 
> map task. Apparently this is not exactly the case, since I have 
> noticed that if I ask for x workers, the job finishes after having 
> used x+1 map tasks. What is this extra task for?
>
> I have been trying out the example SSSP application on a single node 
> with 12 cores. Giving an input graph of ~400MB and using 1 worker, 
> around 10 GBs of memory are used during execution. What intrigues me 
> is that if I use 2 workers for the same input (and without limiting 
> memory per map task), double the memory will be used. Furthermore, 
> there will be no improvement in performance. I rather notice a 
> slowdown. Are these observations normal?
>
> Might it be the case that 1 and 2 workers are very few and I should go 
> to the 30-100 range that is the proposed number of mappers for a 
> conventional MapReduce job?
>
> Finally, a last observation. Even though I use only 1 worker, I see 
> that there are significant periods during execution where up to 90% of 
> the 12 cores computing power is consumed, that is, almost 10 cores are 
> used in parallel. Does each worker spawn multiple threads and 
> dynamically balances the load to utilize the available hardware?
>
> Thanks a lot in advance!
>
> Best,
> Alexandros
>
>


Mime
View raw message