giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandros Daglis <alexandros.dag...@epfl.ch>
Subject Re: What a "worker" really is and other interesting runtime information
Date Wed, 28 Nov 2012 10:13:06 GMT
Thank you Avery, that helped a lot!

Regards,
Alexandros

On 27 November 2012 20:57, Avery Ching <aching@apache.org> wrote:

> Hi Alexandros,
>
> The extra task is for the master process (a coordination task). In your
> case, since you are using a single machine, you can use a single task.
>
> -Dgiraph.SplitMasterWorker=**false
>
> and you can try multithreading instead of multiple workers.
>
> -Dgiraph.numComputeThreads=12
>
> The reason why cpu usage increases is due to netty threads to handle
> network requests.  By using multithreading instead, you should bypass this.
>
> Avery
>
>
> On 11/27/12 9:40 AM, Alexandros Daglis wrote:
>
>> Hello everybody,
>>
>> I went through most of the documentation I could find for Giraph and also
>> most of the messages in this email list, but still I have not figured out
>> precisely what a "worker" really is. I would really appreciate it if you
>> could help me understand how the framework works.
>>
>> At first I thought that a worker has a one-to-one correspondence to a map
>> task. Apparently this is not exactly the case, since I have noticed that if
>> I ask for x workers, the job finishes after having used x+1 map tasks. What
>> is this extra task for?
>>
>> I have been trying out the example SSSP application on a single node with
>> 12 cores. Giving an input graph of ~400MB and using 1 worker, around 10 GBs
>> of memory are used during execution. What intrigues me is that if I use 2
>> workers for the same input (and without limiting memory per map task),
>> double the memory will be used. Furthermore, there will be no improvement
>> in performance. I rather notice a slowdown. Are these observations normal?
>>
>> Might it be the case that 1 and 2 workers are very few and I should go to
>> the 30-100 range that is the proposed number of mappers for a conventional
>> MapReduce job?
>>
>> Finally, a last observation. Even though I use only 1 worker, I see that
>> there are significant periods during execution where up to 90% of the 12
>> cores computing power is consumed, that is, almost 10 cores are used in
>> parallel. Does each worker spawn multiple threads and dynamically balances
>> the load to utilize the available hardware?
>>
>> Thanks a lot in advance!
>>
>> Best,
>> Alexandros
>>
>>
>>
>

Mime
View raw message