hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Re: Graph API aggregators
Date Mon, 30 Dec 2013 07:16:53 GMT
Hi Anastasis,

Let's read "Aggregators" described in Pregel paper again.

1) "Each vertex can provide a value to an aggregator in superstep S,
... the Aggregated value is available to all the vertices in superstep

So, I think we should have to add aggregate interfaces to "Vertex" like below:

 * Aggregator registration. Each aggregator can have its own
user-defined aggregation function.
public void registerAggregator(String name, Class<? extends
Aggregator> aggregatorClass);

 * Provides the value to the specified aggregator.
 * @name identifies a aggregator
 * @value value to be aggregated
public void aggregate(String name, M value);

 * Returns the value of the specified aggregator.
public M getAggregatedValue(String name);

Actually, this is more flexible to use, and more easy to manage
multiple aggregators. For example, vertex can have two Min and Max

this.aggregate(MIN_AGGREGATOR, this.getValue());

Then, we can find the min and max values of vertices in next superstep.

2) "Aggregators can also be used for statistics and for global
coordination, .... another branch can be executed until termination".

In here, my question was, "how can we get the historical minimum value
in each superstep?" or "how can we get the average or sum value of
entire vertices?". It's impossible without aggregation of entire
(active and inactive) vertices' values.

BTW, I just noticed that Giraph provides two types of aggregator
behaviour; regular and persistent. In my eyes, their solution is more
reasonable than voteToStop, ..., skipAggregator().

What do you think?

On Mon, Dec 30, 2013 at 2:00 AM, Anastasis Andronidis
<andronat_asf@hotmail.com> wrote:
> Hello and sorry for my late response,
> I sending this email so we can plan and confirm further the issues with aggregators.
> As far as I understood from the discussion on HAMA-833, we should split the functionality
of the aggregators in 2 different kinds:
> 1) global aggregators -> will aggregate information from every vertex in every superstep,
ignoring the fact that a vertex maybe is in a voteToHalt state.
> 2) private aggregators -> will aggregate information on demand of the user.
> I also want to confirm that we don't need any other states for vertices like those that
proposed in HAMA-833:
>> 1) voteToStop() : Immediately stop the vertex compute and suppress any further calculations
on top of that. (e.g. aggregation)
>> 2) voteToTerminate(): Immediately stop the vertex compute, suppress any further calculations
on top of that and deactivate the vertex so even if any message reaches it, will not come
> One last think, is the issue for skipAggregator() functionality. The idea behind this
function was to control the calculation of aggregators.
> e.g.
> Let's say we have ab aggregator that is slow because it needs to perform some complex
calculations. But we also need to perform this heavy calculation every 4 supersteps. The skipAggregator()
method can provide a solution to this problem, by specifically request to skip the aggregation
on the next superstep.
> But, If we implement the private aggregators, we might don't need that feature, as the
user will have an absolute control over aggregators.
> Cheers,
> Anastasis
> P.S.
> the name private aggregator is for the current discussion, if you have any better names,
please reply.

Best Regards, Edward J. Yoon

View raw message