incubator-giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Claudio Martella (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GIRAPH-45) Improve the way to keep outgoing messages
Date Fri, 16 Dec 2011 18:32:31 GMT

    [ https://issues.apache.org/jira/browse/GIRAPH-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171119#comment-13171119
] 

Claudio Martella commented on GIRAPH-45:
----------------------------------------

in the current naive implementation key=vertex_id value=message, i keep an in memory SortedMap<I,
Queue<M>> (concurrentskiplistmap). when the map is under memory pressure i flush
it to disk to a new file, sorted with its own BTree index and its own BloomFilter. This means
that i'm going to have possibly multiple SequenceFiles at the end of the messages collection
from other peers (the beginning of each superstep).

to read the messages for a vertex at compute() time i ask all these files to provide me their
partial set of messages for that vertex. this means max N seeks to the block holding them
(where N is the number of files and assuming all N files have data about the given vertex,
bloomfilter (and partially the index as well) is used exactly to avoid N seeks when not necessary).
writing is append-only at flush.

in the optimized implementation key=vertex_id and value=messages, and that's going to be a
bit more serialize-deserialize efficient.

so, I'm never going to spill just a few tuples at a time. it really is a simplified version
of bigtable/hbase, where i take advantage of our particular demands/contraints the simplify
my life quite a lot (as i said, no random reads, no update/deletes, single reader)
                
> Improve the way to keep outgoing messages
> -----------------------------------------
>
>                 Key: GIRAPH-45
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-45
>             Project: Giraph
>          Issue Type: Improvement
>          Components: bsp
>            Reporter: Hyunsik Choi
>            Assignee: Hyunsik Choi
>
> As discussed in GIRAPH-12(http://goo.gl/CE32U), I think that there is a potential problem
to cause out of memory when the rate of message generation is higher than the rate of message
flush (or network bandwidth).
> To overcome this problem, we need more eager strategy for message flushing or some approach
to spill messages into disk.
> The below link is Dmitriy's suggestion.
> https://issues.apache.org/jira/browse/GIRAPH-12?focusedCommentId=13116253&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13116253

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message