hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Jungblut (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HAMA-704) Optimization of memory usage during message processing
Date Fri, 22 Feb 2013 13:16:12 GMT

    [ https://issues.apache.org/jira/browse/HAMA-704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13584246#comment-13584246
] 

Thomas Jungblut commented on HAMA-704:
--------------------------------------

bq.Create an array of say 8 or 16 spilling queues, basically partitioning the messages into
vertice corresponding buckets. So spilling queue[0] would be responsible for messages for
first numVertices/16 vertices, queues[1] for the next numVertices/16. 

That is cool, you just have to make sure that they are sorted and getCurrentMessage returns
them in this order. Nothing much else is needed ;-)
Your approach has a bit of a problem, as the currentMessage must be the lowest in all the
received messages. So a heap structure wouldn't be wrong in this case although very difficult
to maintain the heap property on disk. Your bucketing must ensure that the order is correct,
otherwise sorting the buckets in itself won't yield a correct working algorithm.

bq.For sorted spilling queue I would have to sort throughout the whole messages.

I guess we can't get over that easily. But I told you in chat already that sorting on sender
side and merging on receiver side is in this case better and easier to implement.

bq.I do understand the messages don't have uniform distribution across vertices and we don't
need a complete B-tree implementation.

The good thing is that a b-tree is maintaining order over all keys, so you can iterate messages
with minimal number of disk seeks.

The message distribution is related to the number of active vertices- in case of pagerank
every vertex is active for all iterations (that is why I benchmark it usually).
We track how many vertices are active, so we can give a hint to the queue how much it should
cache. 

I guess we shouldn't get that fancy for the first shot, improvement can always be done later
on.
                
> Optimization of memory usage during message processing
> ------------------------------------------------------
>
>                 Key: HAMA-704
>                 URL: https://issues.apache.org/jira/browse/HAMA-704
>             Project: Hama
>          Issue Type: Improvement
>          Components: graph
>            Reporter: Edward J. Yoon
>            Assignee: Edward J. Yoon
>            Priority: Critical
>             Fix For: 0.6.1
>
>         Attachments: HAMA-704_1337.patch, HAMA-704_1338.patch, HAMA-704.patch-v1, hama-704_v05.patch,
HAMA-704-v2.patch, localdisk.patch, mytest.patch, patch.txt, patch.txt, removeMsgMap.patch
>
>
> <vertex, message> map seems consume a lot of memory. We should figure out an efficient
way to reduce memory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message