incubator-giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avery Ching <ach...@apache.org>
Subject Re: Role of Number of Giraph workers for CPU utilisation and scalability ?
Date Wed, 04 Apr 2012 18:09:59 GMT
Hi Benjamin,

Thanks for sharing your results.  I've commented inline.

Avery

On 4/4/12 9:23 AM, Benjamin Heitmann wrote:
> Hello,
>
> I have a more general regarding the number of workers:
>
> How does the number of workers relate to the utilisation of the CPU cores on which the
Giraph job is run ?
Hard to say.  This depends heavily on how well the load is distributed 
between the workers and the ratio between computation/network.
>
> In my situation, if I start the job with 10 workers or 20 or 30 workers, it takes about
the same time to finish.
This is an interesting result.  I am going to be working on some network 
improvements that should help this.

> Also the CPUs are used in the same way, which is to say that about 75% of the machine
is idle independently of the number of workers.
>
> Is there another way to make Giraph utilise the CPU cores better ?
>
> Is there a general explanation for this behaviour of Giraph ?
>
> If not, then maybe there is an explanation which is specific to my job, the hadoop setup
and the hardware:
> * I use about 6.5 GB of input data
> * Importing the input data takes between 11 and 14 minutes (See below the vertex input
superstep timing)

My guess is data skew.  Can you look into the workers during the input 
stage and see how long each vertex input split is taking to load?

> * The hadoop setup is a single node / pseudo distributed hadoop 1.0.1 installation
> * The machine has 24 cores and 120 GB of RAM, and runs (some form of) Linux
Nice machine! =)

> Is it possible that the Vertex input could parallelised in a better way in Giraph ?

Certainly.  This also depends on the vertex input format as well though.

>
> cheers, Benjamin.
>
> 12/04/02 13:47:29 INFO mapred.JobClient:   Giraph Timers
> 12/04/02 13:47:29 INFO mapred.JobClient:     Total (milliseconds)=828120
> 12/04/02 13:47:29 INFO mapred.JobClient:     Superstep 3 (milliseconds)=11320
> 12/04/02 13:47:29 INFO mapred.JobClient:     Setup (milliseconds)=51446
> 12/04/02 13:47:29 INFO mapred.JobClient:     Shutdown (milliseconds)=14343
> 12/04/02 13:47:29 INFO mapred.JobClient:     Vertex input superstep (milliseconds)=682368
 ->  ~11minutes
> 12/04/02 13:47:29 INFO mapred.JobClient:     Superstep 0 (milliseconds)=17495 -> 
17 seconds
> 12/04/02 13:47:29 INFO mapred.JobClient:     Superstep 4 (milliseconds)=22737 -> 
22 seconds
> 12/04/02 13:47:29 INFO mapred.JobClient:     Superstep 2 (milliseconds)=23395 -> 
23 seconds
> 12/04/02 13:47:29 INFO mapred.JobClient:     Superstep 1 (milliseconds)=5013 ->  5
seconds
> 12/04/02 13:47:29 INFO mapred.JobClient:   Giraph Stats
> 12/04/02 13:47:29 INFO mapred.JobClient:     Aggregate edges=61475601
> 12/04/02 13:47:29 INFO mapred.JobClient:     Superstep=5
> 12/04/02 13:47:29 INFO mapred.JobClient:     Last checkpointed superstep=4
> 12/04/02 13:47:29 INFO mapred.JobClient:     Current workers=18
> 12/04/02 13:47:29 INFO mapred.JobClient:     Current master task partition=0
> 12/04/02 13:47:29 INFO mapred.JobClient:     Sent messages=0
> 12/04/02 13:47:29 INFO mapred.JobClient:     Aggregate finished vertices=10430616
> 12/04/02 13:47:29 INFO mapred.JobClient:     Aggregate vertices=10430616


Mime
View raw message