giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avery Ching <ach...@apache.org>
Subject Re: Adsorption on giraph - memory problems
Date Wed, 03 Oct 2012 01:09:18 GMT
Hi Baldo,

We are using trove 
(http://mvnrepository.com/artifact/net.sf.trove4j/trove4j) to pack the 
vertices into a smaller size.  You are likely to see a nice benefit as 
well.  You could also try the out-of-core memory and vertex implementations.

Avery

On 10/2/12 6:01 PM, Baldo Faieta wrote:
> Hi Everyone.
>
> I have implemented the Adsorption algorithm (
> http://rio.ecs.umass.edu/~lga 
> <http://rio.ecs.umass.edu/%7Elga=>o/ece697_10/Paper/random.pdf ,
> http://talukdar.net/papers/adsorption_ecml0 
> <http://talukdar.net/papers/adsorption_ecml0=>9.pdf )
> as it seems well suited for running in giraph. I'm testing the data
> with the movielens dataset  ( http://www.grouplens.org/node/73 ) and when
> I run it with a small graph ( 6k nodes, 200k edges) it runs ok.
>
> But as soon as I want to scale the graph I run into memory problems. I'm
> running it with 3 processes and I have set the mapred.map.child.java.opts
> variable pretty high ( 2G per process). Looking at the memory allocation
> in each superstep, it seems that all the messages are allocated in memory
> during a superstep before being processed and it runs out of memory
> pretty quickly when I increase the size of the graph (e.g., 20k nodes,
> 1M edges).
>
> The algorithm works by sending label distributions to outgoing 
> vertices and
> aggregating the distributions when it receives the messages. I have imple-
> mented a combiner for the messages but it doesn't seem to help.
>
> I think the problem is that the messages themselves, because they are dis-
> tributions, they consume more memory than other examples (e.g., page rank)
> and it seems that you need hefty memory allocation per process to keep all
> the messages in memory before they can be processed or even combined. Is
> this the case? Is there a way to be more aggressive with the combiner?
> Ideally it would be great to store the messages offline until they can be
> processed so as not to run into this problem. Does anyone have any
> suggestions or I just have to get servers with much more memory?
>
> BTW, if anyone is interested, I can try to post the implementation. I am
> using it as a way to propagate resources to recommend to users based on
> the relations of the users to the resources and the interrelations
> between the resources with each other (e.g., user --viewed --> movie ,
> director --directed --> movie , movie --is-genre-of --> genre, etc.)
>
> Thanks,
>
> Baldo
>


Mime
View raw message