giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Reisman (JIRA)" <>
Subject [jira] [Commented] (GIRAPH-357) Don't try to combine if there is only one message
Date Mon, 08 Oct 2012 22:44:03 GMT


Eli Reisman commented on GIRAPH-357:

I have not used the combining features too much. If you are combining messages at the sender
side, the benefit should come only if the messages in SendMessageCache build up long enough
to have enough queued to the same destination that its worth combining them, right? The more
often you send a burst of cached messages, the less often the combiner is going to build up
enough messages to actually have a few to combine that actually saves us some resources? And
what sort of combining operation this is, and the nature of the messages getting into the
cache that may or may not be combinable is different for different algorithms?

So, in my naive view of combining, it seems:

1. Running the combiner function on bundles of outgoing messages from the cache to a given
worker might need to be tuned per-application?

2. Running it below some threshold of # of messages-per-outgoing-cache-bundle will always
be silly/ineffecient, such as combining on every 1-messsage send. BTW: when do we ever (in
the current form) send just one message at a time? It seems like this could only happen on
the final flush of the cache at the end of a superstep?

So...would this be something we would tune with the "# of cached messages per-worker before
flushing cache" GiraphConfiguration dash-D option, per application, rather than in code, assuming
this algorithm needs a client-side combiner? If we always send X number of messages, the combiner
should always have X or so to work with in the hopes of matching and reducing a few before

When you say "serialize" do you mean on the network, or spill to disk for later sending at
the end of the superstep? I'm assuming the former? One of the things I have been very aware
of during GIRAPH-328/322 is the fact that its one thing to carefully keep a single reference
to something on the send side (for example) but its entirely another to innocently serialize
it and end up with N unique copies at the far end of the deserialization. Is there some overarching
idea here about how to minimize this? One thing I like about the idea (just the idea so far!)
of 322 is that on both the client and recv sides, the original message reference can be shared
without N copies being created during ser/deser.

> Don't try to combine if there is only one message
> -------------------------------------------------
>                 Key: GIRAPH-357
>                 URL:
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Maja Kabiljo
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-357.patch
> In SendMessageCache, we call combiner even if we have just one message. Combining is
kind of expensive since we recreate the message object and the list. With default settings
and bigger graph, for PageRankBenchmark there is 10-15% superstep speedup if we don't call
it when we have a single message.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message