giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Puneet Agarwal <puagar...@yahoo.com>
Subject Re: Issue in Aggregator
Date Tue, 11 Nov 2014 19:36:12 GMT
I am stuck at this, requesting help again.

Please inform how giraph aggregated the global value of an aggregator from the various values
itreceives from the local aggregators?does it call the same class ?It seems not, please confirm.
 

     On Sunday, November 9, 2014 11:48 PM, Puneet Agarwal <puagarwal@yahoo.com> wrote:
   

 Dear Matthew,
Yes my aggregator is commutative, and I want globally aggregated value from all workers, and
not the locally aggregated.
But I am getting locally aggregated value.
My query is : 
Does Giraph call the aggregator class for the values returned by the local aggregators or
not?
- PuneetIIT Delhi, India
 

     On Sunday, November 9, 2014 5:43 AM, Matthew Saltz <saltzm@gmail.com> wrote:
   

 Hi Puneet,It's unclear to me what you're wanting in terms of aggregator behavior. Are you
saying you want an aggregator such that the final output is the aggregated value just for
a particular worker? With an aggregator you should at least make sure the operations you're
performing are commutative; that is, the order in which items are aggregated should not matter
unless it is explicitly dealt with somehow. Otherwise you'll get unpredictable results. Best,
Matthew SaltzEl 08/11/2014 15:05, "Puneet Agarwal" <puagarwal@yahoo.com> escribió:

Hi All,In my algo, I use an Aggregator which takes a Text value. I have written my custom
aggregator class for this, as given below.

public class MyAgg extends BasicAggregator<Text> {...}

This works fine when running on my laptop with one worker.However, when running it on the
cluster, sometimes it does not return the correctly aggregated value.It seems it is returning
the locally aggregated value of one of the workers.While it should have used my logic to decide
which of the aggregated values sent by various worker should be chosen as finally aggregated
values.(But in fact I have not written such a code anywhere, it is therefore doing the best
it could)

Following is how is my analysis about this issue.a.    I guess every worker aggregates
the values locally.b.    then there is a global aggregation step, which simply compares
the values sent by various aggregators.c.    For global aggregation it uses Text.compareTo()
method. This method Text.compareTo() is a default Hadoop implementation and does not include
the logic of my program.d.    It seem it is because of the above the value returned by
my aggregator in the cluster is actually not globally aggregated, but the locally aggregated
value of one of the worker gets taken.
If the above analysis is correct, following is how I think I can solve this.I should write
my own class that implements Writable interface. In this class I would also write a compareTo
method as a result things will start working fine.
If it was using class MyAgg itself, to decide which of the values returned by various workers
should be taken as globally aggregated value then this problem would not have occurred.

I seek your guidance whether my analysis is correct.
- PuneetIIT Delhi, India




    

   
Mime
View raw message