giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Heitmann <>
Subject Re: Multiple jobs on same graph, aggregator use and LocalRunner issue
Date Wed, 06 Jun 2012 02:10:03 GMT

Hi Clive, 

On 5 Jun 2012, at 22:21, Clive Cox wrote:
> I recently started playing with Giraph and I have a few questions.
> 1. I'm writing a simple spreading activation algorithm

I am also working on a spreading activation algorithm. 
My original data is in the form of an RDF graph, which has typed edges and vertices, 
which is pretty far away from the kind of pagerank algorithm for which Google Pregel
and thus Apache Giraph is optimised for. 

So I can understand your questions very well. 

> which would be
> run many times over the same graph with different initial vertices
> activated. Doing this as separate jobs in which a potentially large
> graph is loaded each time will be slow. Is there a way to run multiple
> BSP runs over the same loaded graph? 

Sadly this is not possible currently AFAIK. The Hadoop paradigm is focused on 
on jobs with a transient graph. 

But I think if enough people speak up to point out how ineffecient it is to just throw away
the graph between jobs, 
maybe some sort of mechanism can be added for running the same algorithm with different "configurations"
on the same graph. 

I need to run the same algorithm on the same graph for different user profiles ("different
and it was a big challenge to run all of those configurations in parallel in just one run.
For my case, 
building the graph takes between 1/3 and 1/4 of the total processing time

> 2. I might want to normalise the vertex values at the end of a
> superstep. I assume I can use an aggregator to get the sum of the values
> but I'm not sure where can I update all vertex values before the next
> superstep?

The best place right now to add some coordinating logic based on a knowledge about the whole
is in the WorkerContext, specifically in the pre-superstep method. 

In the compute method of a vertex, you can add a value to a Sum/LongSum Aggregator.
Then in the pre-superstep method of the WorkerContext you can check the value of that aggregator.

Then you can either re-set that same aggregator, or you can set another aggregator. Then in
the next superstep
the vertices will need to check that aggregator and retrieve the new normalised value.

Somebody started to work on a patch for a centralised master which will be able to control/coordinate
the whole graph, 
but nothing has been finished for that. The Jira issue is here:

> 3. On a smaller trivial point: Running within a LocalRunner for
> debugging I need to delete the local zookeeper state created in _bsp*
> folders otherwise the next run does nothing as its assumes its the same
> state and just finishes straight away. 

I never had that issue, so I cant comment on that. 
View raw message