giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Garcia <>
Subject Re: Caching (with LRU or something) strategy in Giraph?
Date Wed, 01 Feb 2012 06:17:11 GMT
Hey Jake, thx for the reply.  I'll look at GIRAPH-45 for this particular topic.  Really quick
though, I thought that Pregel was an implementation of BSP (a programming model. . .completely
orthogonal from the manner in which data is retrieved/stored).  It seems quite reasonable
to implement a basic caching strategy in the case all vertices don't fit in memory for a particular
worker.  Thx again for your input.  I'll direct my question to GIRAPH-45 topic.


From: Jake Mannix <<>>
Reply-To: "<>"
Date: Wed, 1 Feb 2012 00:01:02 -0600
To: "<>" <<>>
Subject: Re: Caching (with LRU or something) strategy in Giraph?

Hi David,

  The *point* of the Pregel architecture (which Giraph is an implementation of) is that the
whole graph is in (distributed) memory.  If you are willing to go to disk, doing your calculations
via MapReduce (possibly talking to a distributed hashtable of some kind colocated with your
hadoop cluster, if it helps) is the straightforward way to go.


On Tue, Jan 31, 2012 at 9:34 PM, David Garcia <<>>
I haven't investigated too deeply into this. . .but is there a caching strategy implemented,
or in the works, for getting around having to load all of a split's vertices into memory?
 If a graph is large enough, even a reasonably sized cluster may not have enough memory to
load all the vertices.  Does Giraph address this currently?


View raw message