giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maja Kabiljo (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GIRAPH-314) Implement better message grouping to improve performance in SimpleTriangleClosingVertex
Date Thu, 06 Sep 2012 14:54:07 GMT

    [ https://issues.apache.org/jira/browse/GIRAPH-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449716#comment-13449716
] 

Maja Kabiljo commented on GIRAPH-314:
-------------------------------------

Great change, javadoc is much easier to understand now.

Two options which I mentioned should prevent us from generating new messages while enough
of the current messages are not processed. So if we use out-of-core messages they shouldn't
be able to pile up. With those options I was able to run RandomMessageBenchmark with really
huge number of messages (it was slow, of course, but it worked). I'm surprised to hear it
didn't work for you.

I'm not sure that we are thinking of the same combiner. Correct me if I'm wrong, but the reason
why amortizing saves you is that you get to process part of messages before receiving new
ones. And processing messages decrease memory used just by replacing several occurrences of
one second degree neighbour with the single number of occurrences. That's what combiner should
also do.

So you are planning to change the infrastructure, in order to support sending the same message
to several vertices on the same worker in a better way? So that in practice we only send the
message and the list of destination vertices, and on the destination worker we have only one
copy of the message? That sounds like a really good improvement for this and similar applications,
where messages are big objects. If messages are not combinable, and if we would have some
good partitioning, this could really decrease the amount of traffic and memory usage here.
                
> Implement better message grouping to improve performance in SimpleTriangleClosingVertex
> ---------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-314
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-314
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples
>    Affects Versions: 0.2.0
>            Reporter: Eli Reisman
>            Assignee: Eli Reisman
>            Priority: Trivial
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-314-1.patch, GIRAPH-314-2.patch, GIRAPH-314-3.patch, GIRAPH-314-4.patch
>
>
> After running SimpleTriangleClosingVertex at scale I'm thinking the sendMessageToAllEdges()
is pretty in the code, but its not a good idea in practice since each vertex V sends degree(V)^2
messages right in the first superset in this algorithm. Could do something with a combiner
etc. but just grouping messages by hand at the application level by using IntArrayListWritable
again does the trick fine.
> Probably should have just done it this way before, but sendMessageToAllEdges() looked
so nice. Sigh. Changed unit tests to reflect this new approach, passes mvn verify and cluster,
etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message