giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Reisman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GIRAPH-388) Improve the way we keep outgoing messages
Date Mon, 29 Oct 2012 23:06:12 GMT

    [ https://issues.apache.org/jira/browse/GIRAPH-388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486468#comment-13486468
] 

Eli Reisman commented on GIRAPH-388:
------------------------------------

I agree completely with Claudio on the client-side combiner, the MR comparision frames the
difference perfectly. And I agree on your great work, I love seeing simplifications like this.
It makes sense why it works so well.

My concern with the social graph is that its lumpy and I saw all sorts of performance degradation
and general behavior quirks running jobs with a social graph that I just never saw with the
benchmarks. The kind of duplication problems I'm talking about might happen when a supernode
belongs to the receiving partition for some messages, etc. Either way, I'm glad the results
were positive, nice job again!

The flushing issue bothered me while I was working on 328 and 322 as well, it needs to happen
for in-memory use cases I'm most familiar with, but it works at cross-purposes to deduplicating
combinable messages. I'd love to see Giraph get smarter about tuning the data-per-flush parameters
in general, these are tricky to tune per-job and have a large effect on performance. Users
I know have lost hope trying to tune these sorts of params by hand.

                
> Improve the way we keep outgoing messages
> -----------------------------------------
>
>                 Key: GIRAPH-388
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-388
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Maja Kabiljo
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-388.patch
>
>
> As per discussion on GIRAPH-357, in standard application chances that we get to use client-side
combiner are very low. I experimented with benefits which we can get from not having the client-side
combiner at all. It turns out that having a lot of maps in SendMessageCache, and then collection
inside each of them, really hurts the performance. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message