giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Junghanns <>
Subject Re: Graph partitioning and data locality
Date Tue, 04 Nov 2014 07:40:10 GMT
sorry for the typo (no coffee yet): vertexID.hashCode() *%* n

On 04.11.2014 08:36, Martin Junghanns wrote:
> Hi group,
> I got a question concerning the graph partitioning step. If I 
> understood the code correctly, the graph is distributed to n 
> partitions by using vertexID.hashCode() & n. I got two questions 
> concerning that step.
> 1) Is the whole graph loaded and partitioned only by the Master? This 
> would mean, the whole data has to be moved to that Master map job and 
> then moved to the physical node the specific worker for the partition 
> runs on. As this sounds like a huge overhead, I further inspected the 
> code:
> I saw that there is also a WorkerGraphPartitioner and I assume he 
> calls the partitioning method on his local data (lets say his local 
> HDFS blocks) and if the resulting partition for a vertex is not 
> himself, the data gets moved to that worker, which reduces the 
> overhead. Is this assumption correct?
> 2) Let's say the graph is already partitioned in the file system, e.g. 
> blocks on physical nodes contain logical connected graph nodes. Is it 
> possible to just read the data as it is and skip the partitioning 
> step? In that case I currently assume, that the vertexID should 
> contain the partitionID and the custom partitioning would be an 
> identity function in that case (instead of hashing or range).
> Thanks for your time and help!
> Cheers,
> Martin

View raw message