giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Neumann <mneum...@spotify.com>
Subject Changing index of a graph
Date Tue, 15 Apr 2014 13:33:25 GMT
Hej,

I have a huge edgelist (several billion edges) where node ID's are URL's.
The algorithm I want to run needs the ID's to be long and there should be
no holes in the ID space (so I cant simply hash the URL's).

Is anyone aware of a simple solution that does not require a impractical
huge hash map?

My idea currently is to load the graph into another giraph job and then
assigning a number to each node. This way the mapping of number to URL
would be stored in the Node.
Problem is that I have to assign the numbers in a sequential way to ensure
there are no holes and numbers are unique. No Idea if this is even possible
in Giraph.

Any input is welcome

cheers Martin

Mime
View raw message