giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Neumann <>
Subject Changing index of a graph
Date Tue, 15 Apr 2014 13:33:25 GMT

I have a huge edgelist (several billion edges) where node ID's are URL's.
The algorithm I want to run needs the ID's to be long and there should be
no holes in the ID space (so I cant simply hash the URL's).

Is anyone aware of a simple solution that does not require a impractical
huge hash map?

My idea currently is to load the graph into another giraph job and then
assigning a number to each node. This way the mapping of number to URL
would be stored in the Node.
Problem is that I have to assign the numbers in a sequential way to ensure
there are no holes and numbers are unique. No Idea if this is even possible
in Giraph.

Any input is welcome

cheers Martin

View raw message