maybe it would be better if you use mapreduce such that in the map phase each key-value pair at a node is a key and the node is the value...this way you get the first level of connections at the reduce-keys...then u can use the output of reduce phase as adjacency list for the graph to be processed using Giraph...

On Mar 28, 2014 6:27 PM, "Matthieu Labour" <> wrote:

I am looking for tips on how to leverage Giraph for the use case below:

I have a list of Nodes. 
A Node is a collection of Key-Value pairs. 
2 Nodes are related (have an edge) if they share a Key-Value pair. 

Until now I have been running a Depth First Search algorithm to cluster the Nodes into Connected Components. 

However, my data set has grown significantly and I need to scale. This is the reason that brought me to Giraph. 

I have gone through the Connected Component example in Giraph but need a bit of help to get started. Specifically I wonder how I can change it to accommodate the use case described above.

I would greatly appreciate any help.
Thank you in advance.