giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Reisman (JIRA)" <>
Subject [jira] [Commented] (GIRAPH-314) Implement better message grouping to improve performance in SimpleTriangleClosingVertex
Date Wed, 05 Sep 2012 22:28:07 GMT


Eli Reisman commented on GIRAPH-314:

The "M" type in this patch changing from IntWritable to IntArrayListWritable + hash.userPartitionCount==#
of workers is the combiner minus the overhead. Our combiners want to aggregate or concatenate
messages, by using an array message type we just skip that stage with the same effect. i was
hoping to use a combiner originally but this is kind of the nightmare use case for Giraph
in general.

A combiner won't work alone because of the E^2 message growth as I scale the input data up
to size, and Giraph is sort of hardcoded to combine and interpret "PartitionId -> VertexId
-> List<M>" and what I need to get algorithms like this to run at scale is PartitionId
mapping to Message mapping to set of vertex destinations w/in the partition. The messaging
I'm using in the "part 2" does a sort of "PartitionId -> M -> Set<I>" run-length
encoding over Netty to do this, but I think I might just incorporate this into the pipeline
from sendMessagesToAllEdges on down because the PartId -> vertId -> List<M> is
sort of baked in everywhere and that code path is guaranteed to be one message to many recipients.

The out of core just plain crashed no matter what i did, same as Netty with the settings,
once the messages are being generated they start to pile up wherever they are on the data
path very quickly. I had the best luck so far with thousands of workers and in-core messaging
+ the amortizing, but the growth is quite fast at the volume of input I'm dealing with, and
in the end the message deduplication plus this "amortization" is going to be the only way
to cobble this together. This implementation got me in the door, but no amount of dividing
at this level of message growth will get all the way there and still scale to run my input

I think my goal is just to get something like a practical scale for triangle closing working
for us, and then look at how to refine it into a more general way for Giraph users to take
advantage of this kind of message growth. I hope to involve some on disk buffering and I'll
certainly try it again but the growth has to be managed before we can even get to that point.

Thanks for your input, I'll certainly be reviewing these ideas as I go along.

> Implement better message grouping to improve performance in SimpleTriangleClosingVertex
> ---------------------------------------------------------------------------------------
>                 Key: GIRAPH-314
>                 URL:
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples
>    Affects Versions: 0.2.0
>            Reporter: Eli Reisman
>            Assignee: Eli Reisman
>            Priority: Trivial
>             Fix For: 0.2.0
>         Attachments: GIRAPH-314-1.patch, GIRAPH-314-2.patch, GIRAPH-314-3.patch
> After running SimpleTriangleClosingVertex at scale I'm thinking the sendMessageToAllEdges()
is pretty in the code, but its not a good idea in practice since each vertex V sends degree(V)^2
messages right in the first superset in this algorithm. Could do something with a combiner
etc. but just grouping messages by hand at the application level by using IntArrayListWritable
again does the trick fine.
> Probably should have just done it this way before, but sendMessageToAllEdges() looked
so nice. Sigh. Changed unit tests to reflect this new approach, passes mvn verify and cluster,

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message