giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Saltz <sal...@gmail.com>
Subject Aggregator result type different from aggregate input type
Date Thu, 17 Jul 2014 09:53:19 GMT
Hi everyone,

I'm trying to implement my own aggregator, whose aggregated value should be
a Map (for which I can use MapWritable) from an id (LongWritable) to a
custom defined type (which simply extends Writable) that contains several
aggregate metrics. I want vertices to be able to do something along the
lines of

aggregate(MY_MAP_AGGREGATOR, new MyAggregatorMessage(id, stat1, stat2));

and then the map aggregator will do something like

public void aggregate(MyAggregatorMessage m) {

    MapWritable currentMap = (MapWritable) getAggregatedValue();

    if (!currentMap.containsKey(m.getId())) {
        // MyAggregatorData contains the aggregate info I want to keep for
        // each id. Contains init. values for stat1 and stat2
        currentMap.put(m.getId(), new MyAggregatorData());
    }

    MyAggregatorData oldData = currentMap.get(m.getId());
    // Performs appropriate aggregates for each stat and stores it. Sum,
    // average, whatever
    oldData.aggregate(m.getStat1(), m.getStat2());
}

However, the problem is that the method signatures
<https://giraph.apache.org/apidocs/org/apache/giraph/aggregators/Aggregator.html>for
Aggregator all have to use the same type. In other words, I can't have

public MapWritable getAggregatedValue()

and

public void aggregate (MyAggregatorMessage m)

because the types are different.

My idea right now is to use a MyAggregatorWritable class that extends
GenericWritable
<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/GenericWritable.html>
to
wrap both MyAggregatorMessage and MapWritable and then use that as the
method signature for both, and deal with the rest through casting. I've
already used GenericWritable for something else so the implementation would
be straightforward.

So, I have a few principle questions, I suppose:

1) Is there a better way to implement this than to use a GenericWritable as
described above? If any of you have code for your own way to do this, I'd
love to see it, and if not, I'd love to contribute what I come up with as a
MapAggregator (in a generic manner) to the Giraph project if that would be
appropriate.

2) Is there anything wrong in principle with this type of solution? In
other words, is there some kind of philosophical or design reason that
having a Map as an aggregator is a bad idea? I know that it might not end
up being very efficient, but as it stands, I'm not seeing any other
solution to my problem; if there's an ordinary kind of workaround that
would be more efficient I'd love to hear it.

3) [Less important and more discussion oriented] Why is the API designed
such that these methods must use the same type? It seems like having an
Aggregator<Message, Result> would be useful.

I apologize for the quite long message, and I appreciate any help you can
offer. If you need any other information, please let me know and I'll be
happy to provide it. In trying to simplify everything I easily could have
made a mistake or left out something important.  Thanks in advance.

Best,
Matthew
http://www.matthewsaltz.com

Mime
View raw message