giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Junghanns <>
Subject Graph partitioning and data locality
Date Tue, 04 Nov 2014 07:36:16 GMT
Hi group,

I got a question concerning the graph partitioning step. If I understood 
the code correctly, the graph is distributed to n partitions by using 
vertexID.hashCode() & n. I got two questions concerning that step.

1) Is the whole graph loaded and partitioned only by the Master? This 
would mean, the whole data has to be moved to that Master map job and 
then moved to the physical node the specific worker for the partition 
runs on. As this sounds like a huge overhead, I further inspected the code:
I saw that there is also a WorkerGraphPartitioner and I assume he calls 
the partitioning method on his local data (lets say his local HDFS 
blocks) and if the resulting partition for a vertex is not himself, the 
data gets moved to that worker, which reduces the overhead. Is this 
assumption correct?

2) Let's say the graph is already partitioned in the file system, e.g. 
blocks on physical nodes contain logical connected graph nodes. Is it 
possible to just read the data as it is and skip the partitioning step? 
In that case I currently assume, that the vertexID should contain the 
partitionID and the custom partitioning would be an identity function in 
that case (instead of hashing or range).

Thanks for your time and help!


View raw message