giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Reisman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GIRAPH-314) Implement better message grouping to improve performance in SimpleTriangleClosingVertex
Date Thu, 13 Sep 2012 18:12:08 GMT

    [ https://issues.apache.org/jira/browse/GIRAPH-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455093#comment-13455093
] 

Eli Reisman commented on GIRAPH-314:
------------------------------------

No problem, I welcome the input. The combiner is not needed at the beginning or is just an
extra step once at the sending side, because we just combined the messages using IntArrayListWritable
instead of many IntWritables right from the get go. From the receiver side, combiners don't
help us much because we still have incredible amounts of extra messages coming in over Netty
all the time as long as the are serialized and de-serialized organized around Partition ->
vertexid -> List<M> and thats what GIRAPH-322 addresses.

As for the message limiting, as long as the sender does not keep iterating on compute() and
we don't overwhelm the sender that way, its a great idea. But once we serialize-deserialize
to disk or anywhere else, we lose the single reference to each message and we get back individual
objects, which then have to be put into a sender-side combiner or other extra plumbing, or
just sent out duplicated on Netty. And we're talking about degree(V)^2 messages for all V
in G(V) so its a lot to churn through in one superstep. The amortizing is fast and by avoiding
the disk we leave the possibility for GIRAPH-322 to manage the message growth without serializing-deserializing
and ending up with a bunch of instances to send over the wire again or random access on the
disk. So I'm not conviced 314 + 322 are a good alternative, but they seem worth exploring
at this point. If it turns out the only way to make large jobs on an application like 314
run to completion is to focus on spill to disk entirely, I will certainly embrace that route.



                
> Implement better message grouping to improve performance in SimpleTriangleClosingVertex
> ---------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-314
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-314
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples
>    Affects Versions: 0.2.0
>            Reporter: Eli Reisman
>            Assignee: Eli Reisman
>            Priority: Trivial
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-314-1.patch, GIRAPH-314-2.patch, GIRAPH-314-3.patch, GIRAPH-314-4.patch
>
>
> After running SimpleTriangleClosingVertex at scale I'm thinking the sendMessageToAllEdges()
is pretty in the code, but its not a good idea in practice since each vertex V sends degree(V)^2
messages right in the first superset in this algorithm. Could do something with a combiner
etc. but just grouping messages by hand at the application level by using IntArrayListWritable
again does the trick fine.
> Probably should have just done it this way before, but sendMessageToAllEdges() looked
so nice. Sigh. Changed unit tests to reflect this new approach, passes mvn verify and cluster,
etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message