giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Claudio Martella <claudio.marte...@gmail.com>
Subject Re: SimplePageRankVertex implementation, dangling nodes and sending messages to all nodes...
Date Tue, 29 May 2012 13:54:40 GMT
I'm not sure they will be needed to send them on the first superstep.
They'll be created and used in the second superstep if necessary. If
they need it in the first superstep, then i guess they'll put them as
a line in the inputfile.
I agree with you that this is kind of messed up :)


On Tue, May 29, 2012 at 3:23 PM, Sebastian Schelter <ssc@apache.org> wrote:
> Oh sorry, I didn't know that discussion. The problem I see is that in
> every implementation, a user might run into this issue, and I don't
> think its ideal to force users to always run a round of sending empty
> messages at the beginning.
>
> Maybe the system should (somehow) automagically do that for the users?
> Really seems to be an awkward situation though...
>
> --sebastian
>
>
>
> On 29.05.2012 15:03, Claudio Martella wrote:
>> About the mapreduce job to prepare the inputset, I did advocate for
>> this solution instead of supporting automatic creation of non-existent
>> vertices implicitly (which I believe adds a logical path in vertex
>> resolution which has some drawbacks e.g you have to check in the
>> hashmap for the existence of the destination vertex for each message,
>> which is "fine" now that it's a hashmap, but it's going to be less
>> fine when/if we turn to TreeMap for out-of-core).
>>
>> Unfortunately the other committers preferred going for the path that
>> helps userland's life, so I guess this solution is not to be
>> considered here either.
>>
>> On Tue, May 29, 2012 at 1:48 PM, Sebastian Schelter <ssc@apache.org> wrote:
>>> On 29.05.2012 13:13, Paolo Castagna wrote:
>>>> Hi Sebastian
>>>>
>>>> Sebastian Schelter wrote:
>>>>> Why do you only recompute the pageRank in each second superstep? Can
we
>>>>> not use the aggregated value of the dangling nodes from the last superstep?
>>>>
>>>> I removed the computing of PageRank values every each second superstep.
>>>> However, I needed to use a couple of aggregators for the dangling nodes
>>>> contribution instead of just one: "dangling-current" and "dangling-previous".
>>>>
>>>> Each superstep, I need to reset the dangling-current aggregator, at the
>>>> same time, I need to know the value of the aggregator at a previous
>>>> superstep.
>>>
>>> You can save the value from the previous step in a static variable in
>>> the WorkerContext before resetting the aggregator.
>>>
>>>>
>>>> I hope it makes sense, let me know if you have a better idea.
>>>>
>>>>> Overall I think we're on a good way to a robust, real-world PageRank
>>>>> implementation, I managed to implement the convergence check with an
>>>>> aggregator, will post an updated patch soon.
>>>>
>>>> I think I've just done it, have a look [1] and let me know if you would have
>>>> done it differently.
>>>>
>>>> Paolo
>>>>
>>>>  [1]
>>>> https://github.com/castagna/jena-grande/blob/11f07dd897562f7a4bf8d6e4845128d7f2cdd2ff/src/main/java/org/apache/jena/grande/giraph/pagerank/PageRankVertex.java#L90
>>>>
>>>>
>>>
>>
>>
>>
>



-- 
   Claudio Martella
   claudio.martella@gmail.com

Mime
View raw message