giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yavuz gokirmak <>
Subject Estimating approximate hadoop cluster size
Date Mon, 20 Feb 2012 06:45:12 GMT
Hi again,

I am trying to estimate minimum requirements to process graph analysis over
my input data,

In shortest path example it is said that
"The first thing that happens is that getSplits() is called by the master
and then the workers will process the InputSplit objects with the
VertexReader to load their portion of the graph into memory"

What I undestood is in a time T all graph nodes must be loaded on cluster
If I have 100 gb of graph data, will I need 25 machines having 4 gb ram

If this is the case I have a big memory problem to anaylze 4tb data :)

best regards.

View raw message