giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Schelter <...@apache.org>
Subject Re: SimplePageRankVertex implementation, dangling nodes and sending messages to all nodes...
Date Tue, 29 May 2012 13:23:49 GMT
Oh sorry, I didn't know that discussion. The problem I see is that in
every implementation, a user might run into this issue, and I don't
think its ideal to force users to always run a round of sending empty
messages at the beginning.

Maybe the system should (somehow) automagically do that for the users?
Really seems to be an awkward situation though...

--sebastian



On 29.05.2012 15:03, Claudio Martella wrote:
> About the mapreduce job to prepare the inputset, I did advocate for
> this solution instead of supporting automatic creation of non-existent
> vertices implicitly (which I believe adds a logical path in vertex
> resolution which has some drawbacks e.g you have to check in the
> hashmap for the existence of the destination vertex for each message,
> which is "fine" now that it's a hashmap, but it's going to be less
> fine when/if we turn to TreeMap for out-of-core).
> 
> Unfortunately the other committers preferred going for the path that
> helps userland's life, so I guess this solution is not to be
> considered here either.
> 
> On Tue, May 29, 2012 at 1:48 PM, Sebastian Schelter <ssc@apache.org> wrote:
>> On 29.05.2012 13:13, Paolo Castagna wrote:
>>> Hi Sebastian
>>>
>>> Sebastian Schelter wrote:
>>>> Why do you only recompute the pageRank in each second superstep? Can we
>>>> not use the aggregated value of the dangling nodes from the last superstep?
>>>
>>> I removed the computing of PageRank values every each second superstep.
>>> However, I needed to use a couple of aggregators for the dangling nodes
>>> contribution instead of just one: "dangling-current" and "dangling-previous".
>>>
>>> Each superstep, I need to reset the dangling-current aggregator, at the
>>> same time, I need to know the value of the aggregator at a previous
>>> superstep.
>>
>> You can save the value from the previous step in a static variable in
>> the WorkerContext before resetting the aggregator.
>>
>>>
>>> I hope it makes sense, let me know if you have a better idea.
>>>
>>>> Overall I think we're on a good way to a robust, real-world PageRank
>>>> implementation, I managed to implement the convergence check with an
>>>> aggregator, will post an updated patch soon.
>>>
>>> I think I've just done it, have a look [1] and let me know if you would have
>>> done it differently.
>>>
>>> Paolo
>>>
>>>  [1]
>>> https://github.com/castagna/jena-grande/blob/11f07dd897562f7a4bf8d6e4845128d7f2cdd2ff/src/main/java/org/apache/jena/grande/giraph/pagerank/PageRankVertex.java#L90
>>>
>>>
>>
> 
> 
> 


Mime
View raw message