giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Reisman (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (GIRAPH-322) Run Length Encoding for Vertex#sendMessageToAllEdges might curb out of control message growth in large scale jobs
Date Thu, 13 Sep 2012 02:37:07 GMT

     [ https://issues.apache.org/jira/browse/GIRAPH-322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Eli Reisman updated GIRAPH-322:
-------------------------------

    Attachment: GIRAPH-322-4.patch

This is some tweaks and improvements. I tried several ways to remove the "duplication-per-partition"
on the sender side, and learned this:

1) it can totally be done, and would deduplicate a lot of messages for all code paths from
Vertex#sendMessage etc.

2) it touches more code than I feel comfortable including in this JIRA when it should really
be a separate JIRA and we should do sendMessage() and sendMessageToAllEdges() at the same
time.

3) I can test GIRAPH-322 just fine using "-Dhash.userPartitionCount==# of workers" to see
what comes of this, and get this commited as its own fix, rolling the partition deduplicating
in the code to the other JIRA mentioned in #2. This idea can then be judged on its own merits
(or not)

4) For future reference, the JIRA mentioned in #2 would require WorkerInfo/PartitionOwner
type plumbing to be per-worker instances and not per-partition anymore, and would require
the netty request ack's like ClientRequestId to use the host-port combo for that worker as
a "destinationWorkerId" rather than the WorkerInfo's partitionId. thats about it. This would
be a good JIRA, a real win I think.

So, here's a version that should bear some testing. I'm still on a laptop but when i get to
set my Giraph rig up again at home I will definitely begin doing this. More soon...


                
> Run Length Encoding for Vertex#sendMessageToAllEdges might curb out of control message
growth in large scale jobs
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-322
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-322
>             Project: Giraph
>          Issue Type: Improvement
>          Components: bsp
>    Affects Versions: 0.2.0
>            Reporter: Eli Reisman
>            Assignee: Eli Reisman
>            Priority: Minor
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-322-1.patch, GIRAPH-322-2.patch, GIRAPH-322-3.patch, GIRAPH-322-4.patch
>
>
> Vertex#sendMessageToAllEdges is a case that goes against the grain of the data structures
and code paths used to transport messages through a Giraph application and out on the network.
While messages to a single vertex can be combined (and should be) in some applications that
could make use of this broadcast messaging, the out of control message growth of algorithms
like triangle closing means we need to de-duplicate messages bound for many vertices/partitions.
> This will be an evolving solution (this first patch is just the first step) and currently
it does not present a robust solution for disk-spill message stores. I figure I can get some
advice about that or it can be a follow-up JIRA if this turns out to be a fruitful pursuit.
This first patch is also Netty-only and simply defaults to the old sendMessagesToAllEdges()
implementation if USE_NETTY is false. All this can be cleaned up when we know this works and/or
is worth pursuing.
> The idea is to send as few broadcast messages as possible by run-length encoding their
delivery and only duplicating message on the network when they are bound for different partitions.
This is also best when combined with "-Dhash.userPartitionCount=# of workers" so you don't
do too much of that.
> If this shows promise I will report back and keep working on this. As it is, it represents
an end-to-end solution, using Netty, for in-memory messaging. It won't break with spill to
disk, but you do lose the de-duplicating effect.
> More to follow, comments/ideas welcome. I expect this to change a lot as I test it and
ideas/suggestions crop up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message