giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paolo Castagna <>
Subject Re: SimplePageRankVertex implementation, dangling nodes and sending messages to all nodes...
Date Mon, 28 May 2012 19:20:55 GMT
Hi Sebastian

Sebastian Schelter wrote:
> I think the code can be improved partially. You don't have to count the
> vertices via an aggregator, you can simply use getNumVertices().

No, you cannot use getNumVertices() since it won't count dangling nodes.
If you want to count for dangling nodes you need to send a message to all

> Why do you only recompute the pageRank in each second superstep? Can we
> not use the aggregated value of the dangling nodes from the last superstep?

Maybe, but I did not managed to do it.

The dangling node contribution needs to be recomputed each time, so the
aggregator needs to be reset to 0 each time and kept unchanged during each
superstep. The value is the sum of the PageRank values of all dangling nodes
in the previous superstep.

The good thing is that now there is a 'correct' implementation and a test
we can use to verify that PageRank values are correct (against a third
party implementation: JUNG).

> Overall I think we're on a good way to a robust, real-world PageRank
> implementation, I managed to implement the convergence check with an
> aggregator, will post an updated patch soon.

Thanks, that was my next step. :-)
+ dumping factor configurable and general clean-up.


> Best,
> Sebastian
> On 28.05.2012 18:39, Paolo Castagna wrote:
>> Paolo Castagna wrote:
>>> Sebastian Schelter wrote:
>>>> I guess that summing up and redistributing the pagerank of dangling
>>>> vertices can also be done without an extra superstep in an aggregator.
>>> Yeah! Why didn't I think about that?
>>> Thanks, great suggestion.
>>> I am going to give this a go, at first without extending RandomWalkVertex since
I want to see how it might work.
>>> This would also inform design and improvements of the current RandomWalkVertex.
>> You can find a 'proper' ;-) implementation of PageRank with dangling nodes
>> support, sum at the end of all PageRank values equals to 1.00 (since it's a
>> probability distribution) and PageRank values validated against a third party
>> implementation (i.e. JUNG), here [1].
>> I have not managed to do it as Sebastian suggested yet.
>> Paolo
>>  [1]

View raw message