hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Jungblut (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HAMA-704) Optimization of memory usage during message processing
Date Wed, 20 Feb 2013 16:27:13 GMT

    [ https://issues.apache.org/jira/browse/HAMA-704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582292#comment-13582292
] 

Thomas Jungblut commented on HAMA-704:
--------------------------------------

VertexID in the vertex must be compatible. That is actually enough for everything.

I was just profiling the memory leak by using 1mio pagerank vertices and 10 edges each (50mb).

Here is much more detailed memory analysis:

_After Reading Vertices to RAM in setup (Superstep2)_

600mb raw heap usage.
418199256 bytes occupied by the vertices.
287999928 bytes occupied by Text objects (used as Vertex Key 48000000 bytes, rest is edge
bytes)
237999192 bytes occupied by Edges (Text Objects and Null references)
 
_In the first superstep_

1,5gb heap
Vertex memory keeps constant. Messages are as follows:
5 mio GraphJobMessages (only half of the out edges) 225mb. So with all messages, this sums
up to a bit less than 500 mb (10 times the graph size!).
Each vertex message contains ~40 bytes, 20 Text, 20 DoubleWritable.

_In the fourth superstep (of 6 in total)_

GC'd to 1,1GB again
BSPMessageBundle contains 4,1 mio messages and is only one time in memory. However the linked
list in that hashmap of the bundle contains 1 MB of data.
Maybe we can switch to an arraylist again, they are much sparser in memory because they aren't
doubly linked.


However, everything is collected properly, so there is no memory leak in my opinion.

BTW: is it intended in the VerticesInfo to do a linear search for every vertex? That is slow
like hell. 
                
> Optimization of memory usage during message processing
> ------------------------------------------------------
>
>                 Key: HAMA-704
>                 URL: https://issues.apache.org/jira/browse/HAMA-704
>             Project: Hama
>          Issue Type: Improvement
>          Components: graph
>            Reporter: Edward J. Yoon
>            Assignee: Edward J. Yoon
>            Priority: Critical
>             Fix For: 0.6.1
>
>         Attachments: HAMA-704.patch-v1, hama-704_v05.patch, HAMA-704-v2.patch, localdisk.patch,
mytest.patch, patch.txt, patch.txt, removeMsgMap.patch
>
>
> <vertex, message> map seems consume a lot of memory. We should figure out an efficient
way to reduce memory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message