giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Claudio Martella (JIRA)" <>
Subject [jira] [Commented] (GIRAPH-388) Improve the way we keep outgoing messages
Date Mon, 29 Oct 2012 21:46:12 GMT


Claudio Martella commented on GIRAPH-388:

Good work Maja. You got me thinking and I think your results make a lot of sense. With neighborhoods
of 100 vertices and 40 workers, you'd expect to have an expected number of slightly over 2
neighbouring vertices in the same partition (100/39). This means that, even if we didn't stream
messages out with buffering, but by kept them all in memory, we'd save a message every two.
If you consider that we buffer a bit but we flush messages as they are produced, the number
of combined messages is basically zero.

This makes a lot of sense if you consider the original idea of the combiner in MapReduce.
There, usually the cardinality of the key set of the original input is much higher than the
one of the intermediate set that you feed to the reducer (otherwhise you wouldn't be reducing,
right?). THERE, the combiner makes a lot of sense. Yes, we still have the same advantage of
using a combiner as with PageRank on MapReduce, because there the cardinalities are the same
as well (But the number of messages is higher, in fact the complexity is O(E), hence the combiner
makes some sense). But the architecture of the shuffle and sort makes the cost of applying
the combiner cheaper  (amortized) compared to us.

I'm always more convinced that the role of the combiner is mostly to save memory than anything
else. So it should be mainly used server-side.
> Improve the way we keep outgoing messages
> -----------------------------------------
>                 Key: GIRAPH-388
>                 URL:
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Maja Kabiljo
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-388.patch
> As per discussion on GIRAPH-357, in standard application chances that we get to use client-side
combiner are very low. I experimented with benefits which we can get from not having the client-side
combiner at all. It turns out that having a lot of maps in SendMessageCache, and then collection
inside each of them, really hurts the performance. 

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message