giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alessio Arleo <>
Subject "Local-only" aggregators
Date Tue, 24 Mar 2015 23:34:51 GMT
Hello everybody

I was wondering if it was possible to extend the concept of aggregator from a “global”
to a “local-only” perspective. 

Normally, aggregators DO cause network traffic because of the cycle: Workers -> Aggregator
Owner-> MasterAggregator -> AggregatorOwner -> Workers

What if I’d like to fetch and aggregate values as I would normally do with aggregators but
without causing this traffic? Let’s assume this situation:

1 - Define a custom partitioning class and let it partition the graph. This is the partition
used to assign vertices to workers. 
2 - in the computation class, every time che compute method is called on a vertex, the data
needed for computation is stored inside the vertex neighbours but also in non-neighbouring
vertices (think about Force Directed layout algorithm for example; to compute the forces,
is necessary the distance between neighbouring and not-neighbouring vertices, applying different
kind of forces).
— Given that the compute class is computing on vertex X
	a - I pick information from X neighbours as I would normally do (iterating its edges or the
incoming messages)
	b - When it comes to non-neighbouring vertices I would like to use data from X worker only.

The first thing I tried to understand before asking this question was: does this make any
sense? I am probably wrong, but this actually does. If I partition my graph to maximize locality,
what I am actually trying to do is to reduce the network traffic as much as possibile. 

My doubt is that if I use aggregators to achieve the result the network traffic would be heavy,
probably losing the advantages of the initial partitioning. What if I could access and modify
an aggregator-like local data structure in the same fashion (i.e. “getAggregatedValue”)
but without broadcasting it (assuming that I do not need the aggregator to be accessible to
every worker)? Or could it be possibile to manually assign partition owners in order to minimise
network traffic (if I need to aggregate all values from vertices in partition 3 and 3 only,
I assign the partition 3 aggregator owner to partition 3 worker)?

I hope in your comprehension and I hope I somehow caught your attention, even if for a brief
moment. Ask me if something is not clear ;)



Ing. Alessio Arleo

Dottorando in Ingegneria Industriale e dell’Informazione

Dottore Magistrale in Ingegneria Informatica e dell’Automazione
Dottore in Ingegneria Informatica ed Elettronica

Linkedin: <>
Skype: Ing. Alessio Arleo

Tel: +39 075 5853920
Cell: +39 349 0575782


View raw message