hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven van Beelen <smcvbee...@gmail.com>
Subject Re: Possible Aggregator Problem
Date Wed, 17 Apr 2013 15:08:33 GMT
Additionally, I found this in the mail archives:
http://mail-archives.apache.org/mod_mbox/hama-user/201210.mbox/%3CCAJ-=ys=W8F5W4aduV+=+yfsvh41xSa22-wNqQRKapadZD+QBag@mail.gmail.com%3E
This actually exactly covers my point. Is this still considered as a bug,
calling two different aggregate functions in a row?


On Wed, Apr 17, 2013 at 2:35 PM, Steven van Beelen <smcvbeelen@gmail.com>wrote:

> Hi Thomas,
>
> Then I guess I did not explain myself clearly.
> What you describe is indeed how I think of the AverageAggregator to work,
> but if I use the AverageAggregator in my own PageRank implementation it
> does not return
> the average of all absolute differences but just the average of the sum of
> all values.
>
> The (very) small example graph I use has only five vertices, were the sum
> of every vertice it's value is always 1.0.
> When I use the AverageAggregator it will always return 0.2 when calling
> the getLastAggregatedValue method.
> It shouldn't do that right?
>
>
> On Wed, Apr 17, 2013 at 1:18 PM, Thomas Jungblut <
> thomas.jungblut@gmail.com> wrote:
>
>> Hi Steven,
>>
>> the AverageAggregator is used to determine the average of all absolute
>> differences between old pagerank and new pagerank for every vertex.
>> This is documented like it should behave in the javadoc of the given
>> classes and suffices to track if pagerank values have yet converged or
>> not.
>>
>> What you describe is a perfectly valid way to track the pagerank
>> difference
>> throughout all supersteps. But this is not how (imho) the
>> AverageAggregator
>> should behave, so you have to write your own.
>>
>>
>> 2013/4/17 Steven van Beelen <smcvbeelen@gmail.com>
>>
>> > The values in my case are the DoubleWritable values each vertice has and
>> > the aggregators aggregate on.
>> > My tests showed that, when the aggregator was set to AverageAggregator,
>> the
>> > average of all the vertice values from the past compute step were
>> returned.
>> > Actually, AverageAggregator should return the average difference of all
>> the
>> > old-new value pairs of every vertice instead of the mean.
>> > The average difference is then used to check whether convergence is
>> > reached, which is relevant for all task ofcourse.
>> >
>> > Hence, the convergence point, for which the Aggregator is used, will
>> not be
>> > reached.
>> > This thus makes it so that the algorithm will just run the maximum
>> number
>> > of iterations set (30 iterations on the PageRank example) in every case.
>> > I experienced the same with my own PageRank implementation.
>> >
>> > I think it has something to do with the finalizeAggregation step taken.
>> > Next to that, both the 'aggregate(VERTEX vertex, M value)' and
>> > 'aggregate(VERTEX vertex, M oldValue, M newValue)' methods are called
>> every
>> > time, were one would think only the second (with old/new values) would
>> > suffice.
>> > Because of this, the global variable 'absoluteDifference' in the
>> > 'AbsDiffAggregator' class is overwriten/overruled by the first
>> aggregate.
>> > Additionally, if one would make its own Aggregation class in the same
>> > fashion as AbsDiffAggregator and AverageAggregator, but leave out the
>> > 'aggregate(VERTEX vertex, M value)', my output turned out to be 0.0000
>> > every time.
>> >
>> > I hope I made myself clear.
>> > Regards
>> >
>> >
>> > On Wed, Apr 17, 2013 at 11:57 AM, Edward J. Yoon <edwardyoon@apache.org
>> > >wrote:
>> >
>> > > Thanks for your report.
>> > >
>> > > What's the meaning of 'all the values'? Please give me more details
>> > > about your problem.
>> > >
>> > > I didn't look at 'dangling links & aggregators' part of PageRank
>> > > example closely, but I think there's no bug. Aggregators is just used
>> > > for global communication. For example, finding max value[1] can be
>> > > done in only one iteration using MaxValueAggregator.
>> > >
>> > > 1.
>> http://cdn.dejanseo.com.au/wp-content/uploads/2011/06/supersteps.png
>> > >
>> > > On Wed, Apr 17, 2013 at 6:27 PM, Steven van Beelen <
>> smcvbeelen@gmail.com
>> > >
>> > > wrote:
>> > > > Hello,
>> > > >
>> > > > I'm creating my own pagerank in hama for a testing and I think I
>> found
>> > a
>> > > > problem with the AverageAggregator. I'm not sure if it is me or the
>> the
>> > > > AverageAggregator class in general, but I believe it just returns
>> the
>> > > mean
>> > > > of all the values instead of the average difference between the old
>> and
>> > > new
>> > > > value as intended.
>> > > >
>> > > > For testing, I created my own AbsDiffAggregator and
>> AverageAggregator
>> > > > classes, using FloatWritable instead of DoubleWritables. The same
>> > problem
>> > > > still occured: I got a mean of all the values in the graph instead
>> of
>> > an
>> > > > average difference.
>> > > >
>> > > > Could someone tell me if I'm doing something wrong or what I should
>> > > provide
>> > > > to better explain my problem?
>> > > >
>> > > > Regards,
>> > > > Steven van Beelen, Vrije Universiteit of Amsterdam
>> > >
>> > >
>> > >
>> > > --
>> > > Best Regards, Edward J. Yoon
>> > > @eddieyoon
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message