hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Jungblut <thomas.jungb...@gmail.com>
Subject Re: Aggregators in Hama
Date Wed, 04 Jul 2012 05:24:33 GMT
Hi Praveen,

you completely got it wrong how aggregators work and it would be great if
you can look into the source code before asking unnecessary questions.
Don't know why Edward is voting that up.

1. Why send the value when it can be got from the vertex?


There is nothing beeing sent, what is send is defined by the aggregator and
not by the method signature.
Example: SumAggregator, the only thing that get's ever sent is what is
returned by getValue().

2. Why send the complete vertex?


Do you really think we send the whole vertex? That is ridiculous.

3. o.a.giraph.graph.Aggregator class has a better interface where only the
> values to be aggregated are sent over the wire.


See point 1, no we have a better interface because you can observe other
vertex attributes like number of edges or previous aggregated values. Which
is possible with Giraph, because you use Aggregators for yourself in the
vertex code whereas Hama hides this usage (which is what a framework is
for).
If you're unhappy with that feel free to change that.

4. Also, will there be a requirement to do the aggregation in only some
> super steps and not all. Let say, to calculate the number of vectors/edges
> in the input and the output graph. In this scenario, aggregation in the
> first and last super step should be good.


Yes that is fine, however a sum aggregator in each superstep is just a 4
byte message and minimal instruction overhead so I'm pretty sure that it is
no big problem running them in each superstep.
For everything else you can use Counters, there is a jira which makes them
more "realtime" and make set counter available in the next superstep to all
peers.

2012/7/4 Edward J. Yoon <edwardyoon@apache.org>

> +1
>
> On Wed, Jul 4, 2012 at 12:46 PM, Praveen Sripati
> <praveensripati@gmail.com> wrote:
> > The o.a.hama.graph.Aggregator interface has the following method
> >
> >   public void aggregate(VERTEX vertex, M value);
> >
> > Couple of things
> >
> > 1. Why send the value when it can be got from the vertex?
> >
> > 2. Why send the complete vertex? In case of semi clustering as described
> in
> > the Google Pregel paper, each vertex maintains a list of semi clusters
> and
> > the data associated with it. Since, all the vertices are sent to the
> master
> > in each superstep this might be a bottleneck with huge graphs.
> >
> > 3. o.a.giraph.graph.Aggregator class has a better interface where only
> the
> > values to be aggregated are sent over the wire.
> >
> > 4. Also, will there be a requirement to do the aggregation in only some
> > super steps and not all. Let say, to calculate the number of
> vectors/edges
> > in the input and the output graph. In this scenario, aggregation in the
> > first and last super step should be good.
> >
> > Any thoughts? Should I open a JIRA for the same.
> >
> > Thanks,
> > Praveen
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message