incubator-giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Heitmann <benjamin.heitm...@deri.org>
Subject Role of Number of Giraph workers for CPU utilisation and scalability ?
Date Wed, 04 Apr 2012 16:23:18 GMT
Hello, 

I have a more general regarding the number of workers: 

How does the number of workers relate to the utilisation of the CPU cores on which the Giraph
job is run ? 

In my situation, if I start the job with 10 workers or 20 or 30 workers, it takes about the
same time to finish. 
Also the CPUs are used in the same way, which is to say that about 75% of the machine is idle
independently of the number of workers. 

Is there another way to make Giraph utilise the CPU cores better ? 

Is there a general explanation for this behaviour of Giraph ? 

If not, then maybe there is an explanation which is specific to my job, the hadoop setup and
the hardware: 
* I use about 6.5 GB of input data
* Importing the input data takes between 11 and 14 minutes (See below the vertex input superstep
timing) 
* The hadoop setup is a single node / pseudo distributed hadoop 1.0.1 installation
* The machine has 24 cores and 120 GB of RAM, and runs (some form of) Linux

Is it possible that the Vertex input could parallelised in a better way in Giraph ? 

cheers, Benjamin. 

12/04/02 13:47:29 INFO mapred.JobClient:   Giraph Timers
12/04/02 13:47:29 INFO mapred.JobClient:     Total (milliseconds)=828120
12/04/02 13:47:29 INFO mapred.JobClient:     Superstep 3 (milliseconds)=11320
12/04/02 13:47:29 INFO mapred.JobClient:     Setup (milliseconds)=51446
12/04/02 13:47:29 INFO mapred.JobClient:     Shutdown (milliseconds)=14343
12/04/02 13:47:29 INFO mapred.JobClient:     Vertex input superstep (milliseconds)=682368
 -> ~11minutes
12/04/02 13:47:29 INFO mapred.JobClient:     Superstep 0 (milliseconds)=17495 -> 17 seconds
12/04/02 13:47:29 INFO mapred.JobClient:     Superstep 4 (milliseconds)=22737 -> 22 seconds
12/04/02 13:47:29 INFO mapred.JobClient:     Superstep 2 (milliseconds)=23395 -> 23 seconds
12/04/02 13:47:29 INFO mapred.JobClient:     Superstep 1 (milliseconds)=5013 -> 5 seconds
12/04/02 13:47:29 INFO mapred.JobClient:   Giraph Stats
12/04/02 13:47:29 INFO mapred.JobClient:     Aggregate edges=61475601
12/04/02 13:47:29 INFO mapred.JobClient:     Superstep=5
12/04/02 13:47:29 INFO mapred.JobClient:     Last checkpointed superstep=4
12/04/02 13:47:29 INFO mapred.JobClient:     Current workers=18
12/04/02 13:47:29 INFO mapred.JobClient:     Current master task partition=0
12/04/02 13:47:29 INFO mapred.JobClient:     Sent messages=0
12/04/02 13:47:29 INFO mapred.JobClient:     Aggregate finished vertices=10430616
12/04/02 13:47:29 INFO mapred.JobClient:     Aggregate vertices=10430616
Mime
View raw message