giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avery Ching <>
Subject Re: Estimating approximate hadoop cluster size
Date Mon, 20 Feb 2012 06:59:54 GMT
Yes, you will need a lot of ram, until we get out-of-core partitions 
and/or out-of-core messages.  Do you really need to load all 4 TB of 
data?  The vertex index, vertex value, edge value, and message value 
objects all take up space as well as the data structures to store them 
(hence your estimates are definitely too low).  How big is the actual 
graph that you are trying to analyze in terms of vertices and edges?


On 2/19/12 10:45 PM, yavuz gokirmak wrote:
> Hi again,
> I am trying to estimate minimum requirements to process graph analysis 
> over my input data,
> In shortest path example it is said that
> "The first thing that happens is that getSplits() is called by the 
> master and then the workers will process the InputSplit objects with 
> the VertexReader to load their portion of the graph into memory"
> What I undestood is in a time T all graph nodes must be loaded on 
> cluster memory.
> If I have 100 gb of graph data, will I need 25 machines having 4 gb 
> ram each?
> If this is the case I have a big memory problem to anaylze 4tb data :)
> best regards.

View raw message