giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Foad Lotfifar <foad...@gmail.com>
Subject Scalability issue on Giraph
Date Mon, 14 Dec 2015 11:14:42 GMT
Hi,

I have a scalability issue for Giraph and I can not find out where is 
the problem.

--- Cluster specs:
# nodes             1
# threads          32
Processor          Intel Xeon 2.0GHz
OS                    ubuntu 32bit
RAM                 64GB

--- Giraph specs
Hadoop            Apache Hadoop 1.2.1
Giraph              1.2.0 Snapshot

Tested Graphs:
amazon0302                V=262,111, E=1,234,877
coAuthorsCiteseer        V=227,320, E=1,628,268


I run the provided PageRank algorithm in Giraph 
"SimplePageRankComputation" with the followng options

(time ($HADOOP_HOME/bin/hadoop jar 
$GIRAPH_HOME/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar

\
  org.apache.giraph.GiraphRunner 
-Dgiraph.graphPartitionerFactoryClass=org.apache.giraph.partition.HashRangePartitionerFactory

\
  org.apache.giraph.examples.PageRankComputation  \
-vif 
org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
-vip /user/hduser/input/$file \
-vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
-op /user/hduser/output/pagerank -w $2 \
-mc 
org.apache.giraph.examples.PageRankComputation\$PageRankMasterCompute)) 
2>&1 \
| tee -a ./pagerank_results/$file.GHR_$2.$iter.output.txt

The algorithm runs without any issue. The number of supersteps is set to 
31 by default in the algorithm.

*Problem:*
I dont get any scalability for more than 8 (or 16) processor cores that 
is I get speedup up to 8 (or 16) cores and then the run time starts to 
increase.

I have run the PageRank with only one superstep as well as running other 
algorithms such as ShortestPath algorithm. I get the same results. I can 
not figure out where is the problem.

1- I have tried two options by changing the giraph.numInputThreads and 
giraph.numOutputThreads: the performance gets a littile bit better but 
no impact on scalability.
2- Does it related to the size of the graphs? because the graphs I am 
testing are small graphs.
3- Is it a platform related issue?

It is the timing details of amazon graph:

# Processor cores
	1 	2 	4 	8 	16 	24 	32


Input 	
	3260 	3447 	3269 	3921 	4555 	4766
Intialise 	
	3467 	36458 	45474 	39091 	100281 	79012
Setup 	
	34 	52 	59 	70 	77 	86
Shutdown 	
	9954 	10226 	11021 	9524 	13393 	15930
Total 	
	135482 	84483 	61081 	52190 	58921 	61898


HDFS READ 	
	21097485 	26117723 	36158199 	57808783 	80086015 	102163071
FILE WRITE 	
	65889 	109815 	197667 	373429 	549165 	724901
HDFS WRITE 	
	7330986 	7331068 	7331093 	7330988 	7330976 	7331203



Best Regards,
Karos




Mime
View raw message