incubator-giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Avery Ching (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GIRAPH-45) Improve the way to keep outgoing messages
Date Thu, 15 Dec 2011 23:01:33 GMT

    [ https://issues.apache.org/jira/browse/GIRAPH-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170558#comment-13170558
] 

Avery Ching commented on GIRAPH-45:
-----------------------------------

Claudio, thanks for your response.

I agree with your points on HDFS as being expensive.  I think the only advantage is convenience
of HDFS is that the reducers can easily get those files.  After thinking about it more, it
probably makes more sense to simply send the messages to the destination worker and do the
storage at the destination worker.  This would allow the destination to process those messages
whenever they want and the destination worker can do the in-memory aggregation and dump to
disk when memory pressure is exceeded.  Storing the messages on the sender complicates things
I believe.  It is simpler for the sender to send its messages out when it is under memory
pressure.

I think it would be nice to have n files such that n == # of partitions owned by that worker.
 Then when loading and computing each partition, we load the relevant messages for that partition
and populate every vertex's message list.  

I am wondering why you need a BTree?  We don't need to sort the messages.

I think that the memory management of the partitions can be done orthogonally.  I'll open
another JIRA.  No need to rush on the messaging improvement.  I've realized that by streaming
the messages as Dmitriy suggested in combination with a combiner executed on the destination
worker, memory usage can be held somewhat at bay for lots of applications.  Still, storing
the messages out-of-core will be important for large graphs.
                
> Improve the way to keep outgoing messages
> -----------------------------------------
>
>                 Key: GIRAPH-45
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-45
>             Project: Giraph
>          Issue Type: Improvement
>          Components: bsp
>            Reporter: Hyunsik Choi
>            Assignee: Hyunsik Choi
>
> As discussed in GIRAPH-12(http://goo.gl/CE32U), I think that there is a potential problem
to cause out of memory when the rate of message generation is higher than the rate of message
flush (or network bandwidth).
> To overcome this problem, we need more eager strategy for message flushing or some approach
to spill messages into disk.
> The below link is Dmitriy's suggestion.
> https://issues.apache.org/jira/browse/GIRAPH-12?focusedCommentId=13116253&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13116253

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message