giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Saltz <sal...@gmail.com>
Subject Re: Issue in Aggregator
Date Sun, 09 Nov 2014 00:13:35 GMT
Hi Puneet,

It's unclear to me what you're wanting in terms of aggregator behavior. Are
you saying you want an aggregator such that the final output is the
aggregated value just for a particular worker? With an aggregator you
should at least make sure the operations you're performing are commutative;
that is, the order in which items are aggregated should not matter unless
it is explicitly dealt with somehow. Otherwise you'll get unpredictable
results.

Best,
Matthew Saltz
El 08/11/2014 15:05, "Puneet Agarwal" <puagarwal@yahoo.com> escribió:

> Hi All,
> In my algo, I use an Aggregator which takes a Text value. I have written
> my custom aggregator class for this, as given below.
>
> public class MyAgg extends BasicAggregator<Text> {
> ...
> }
>
> This works fine when running on my laptop with one worker.
> However, when running it on the cluster, sometimes it does not return the
> correctly aggregated value.
> It seems it is returning the locally aggregated value of one of the
> workers.
> While it should have used my logic to decide which of the aggregated
> values sent by various worker should be chosen as finally aggregated values.
> (But in fact I have not written such a code anywhere, it is therefore
> doing the best it could)
>
> Following is how is my analysis about this issue.
> a.    I guess every worker aggregates the values locally.
> b.    then there is a global aggregation step, which simply compares the
> values sent by various aggregators.
> c.    For global aggregation it uses Text.compareTo() method. This method
> Text.compareTo() is a default Hadoop implementation and does not include
> the logic of my program.
> d.    It seem it is because of the above the value returned by my
> aggregator in the cluster is actually not globally aggregated, but the
> locally aggregated value of one of the worker gets taken.
>
> If the above analysis is correct, following is how I think I can solve
> this.
> I should write my own class that implements Writable interface. In this
> class I would also write a compareTo method as a result things will start
> working fine.
>
> If it was using class MyAgg itself, to decide which of the values returned
> by various workers should be taken as globally aggregated value then this
> problem would not have occurred.
>
> *I seek your guidance whether my analysis is correct.*
>
> - Puneet
> IIT Delhi, India
>
>

Mime
View raw message