giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Reisman (JIRA)" <>
Subject [jira] [Commented] (GIRAPH-388) Improve the way we keep outgoing messages
Date Mon, 29 Oct 2012 19:40:13 GMT


Eli Reisman commented on GIRAPH-388:

I wasn't really associating 314 with this patch, don't know...? The advantage with the maps
of maps approach in 328 was that we took pains not to needlessly duplicate data in the data
structures when we could reference the same message (or whatever) multiple times. I think
the ideas in 322 (way outdated now) are more in opposition to this new approach than 314.
I was comparing this patch with the 328 patch.

As we move forward I hope we don't re-introduce a bunch of data duplication, but instead move
towards eliminating it from the data structures and the disk spill format. If we trade speed
for space too often during this process, we will be hurting the in-memory use cases to favor
the disk-spill cases.

Be careful as you do a larger redesign to try evaluating on real data rather than the benchmarks,
the behavior is so different and the benchmarks are so forgiving. The impressions you get
of performance will be night and day with real social graph data. This will be reflected in
many more facets than just the frequency of duplicated vertex id's in the VIMC. There are
parts of the code I'm to not willing to touch until I have a good size cluster to run real
data on.

There is so much change in this area of the codebase right now (and I have been so busy) that
I have let 322 and 314 lie fallow for a while. I think I will lay off until working on them
until I see what you guys have in mind for this part of the code. Maybe there won't be a need
for either of them!

> Improve the way we keep outgoing messages
> -----------------------------------------
>                 Key: GIRAPH-388
>                 URL:
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Maja Kabiljo
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-388.patch
> As per discussion on GIRAPH-357, in standard application chances that we get to use client-side
combiner are very low. I experimented with benefits which we can get from not having the client-side
combiner at all. It turns out that having a lot of maps in SendMessageCache, and then collection
inside each of them, really hurts the performance. 

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message