giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Claudio Martella <>
Subject Re: Changing index of a graph
Date Tue, 15 Apr 2014 13:46:07 GMT
The only solution i know is usually done via a so-called dictionary outside
of giraph (e.g. for semantic web graphs which also have URIs as IDs),
through a datastore like HBase/Cassandra, basically the hashmap you
While initially computationally expensive, it allows you to scale in the
long run, because adding an edge is just incrementing a counter in the
store and add the mapping.

On Tue, Apr 15, 2014 at 3:33 PM, Martin Neumann <>wrote:

> Hej,
> I have a huge edgelist (several billion edges) where node ID's are URL's.
> The algorithm I want to run needs the ID's to be long and there should be
> no holes in the ID space (so I cant simply hash the URL's).
> Is anyone aware of a simple solution that does not require a impractical
> huge hash map?
> My idea currently is to load the graph into another giraph job and then
> assigning a number to each node. This way the mapping of number to URL
> would be stored in the Node.
> Problem is that I have to assign the numbers in a sequential way to ensure
> there are no holes and numbers are unique. No Idea if this is even possible
> in Giraph.
> Any input is welcome
> cheers Martin

   Claudio Martella

View raw message