giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Claudio Martella <claudio.marte...@gmail.com>
Subject Re: "Local-only" aggregators
Date Wed, 25 Mar 2015 17:25:05 GMT
Hi,

I'm not sure aggregators require necessarily high traffic. Aggregators are
aggregated locally on the worker before they are aggregated on the
(corresponding) master worker.
Anyway, assuming you want to proceed, my understanding is that you want
vertices on the same worker to share (aggregated) information. In that
case, I'd suggest just using a WorkerContext.

Hope this helps.
Claudio

On Wed, Mar 25, 2015 at 12:47 AM Alessio Arleo <ingarleo@icloud.com> wrote:

> Hello everybody
>
> I was wondering if it was possible to extend the concept of aggregator
> from a “global” to a “local-only” perspective.
>
> Normally, aggregators DO cause network traffic because of the cycle:
> Workers -> Aggregator Owner-> MasterAggregator -> AggregatorOwner -> Workers
>
> What if I’d like to fetch and aggregate values as I would normally do with
> aggregators but without causing this traffic? Let’s assume this situation:
>
> 1 - Define a custom partitioning class and let it partition the graph.
> This is the partition used to assign vertices to workers.
> 2 - in the computation class, every time che compute method is called on a
> vertex, the data needed for computation is stored inside the vertex
> neighbours but also in non-neighbouring vertices (think about Force
> Directed layout algorithm for example; to compute the forces, is necessary
> the distance between neighbouring and not-neighbouring vertices, applying
> different kind of forces).
> — Given that the compute class is computing on vertex X
> a - I pick information from X neighbours as I would normally do (iterating
> its edges or the incoming messages)
> b - When it comes to non-neighbouring vertices I would like to use data
> from X worker only.
>
> The first thing I tried to understand before asking this question was:
> does this make any sense? I am probably wrong, but this actually does. If I
> partition my graph to maximize locality, what I am actually trying to do is
> to reduce the network traffic as much as possibile.
>
> My doubt is that if I use aggregators to achieve the result the network
> traffic would be heavy, probably losing the advantages of the initial
> partitioning. What if I could access and modify an aggregator-like local
> data structure in the same fashion (i.e. “getAggregatedValue”) but without
> broadcasting it (assuming that I do not need the aggregator to be
> accessible to every worker)? Or could it be possibile to manually assign
> partition owners in order to minimise network traffic (if I need to
> aggregate all values from vertices in partition 3 and 3 only, I assign the
> partition 3 aggregator owner to partition 3 worker)?
>
> I hope in your comprehension and I hope I somehow caught your attention,
> even if for a brief moment. Ask me if something is not clear ;)
>
> Cheers!
>
> ~~~~~~~~~~~~~~~~~~~
>
> Ing. Alessio Arleo
>
> Dottorando in Ingegneria Industriale e dell’Informazione
>
> Dottore Magistrale in Ingegneria Informatica e dell’Automazione
> Dottore in Ingegneria Informatica ed Elettronica
>
> Linkedin: it.linkedin.com/in/IngArleo
> Skype: Ing. Alessio Arleo
>
> Tel: +39 075 5853920
> Cell: +39 349 0575782
>
> ~~~~~~~~~~~~~~~~~~~
>
>
>
>

Mime
View raw message