giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Puneet Jain <>
Subject Job settings to run PageRank on 75M vertices
Date Wed, 24 Jul 2013 22:30:48 GMT

I am struggling to make PageRank run on 75M nodes with each node having
1-75000 edges.

I am constantly getting zookeeper timeouts irrespective of my configuration.

- I have 21 node hadoop cluster, each node having 4 cores, 4GB memory.
- Data is stored in hbase as adjacency matrix
- I am running 21 regionservers, 3 zookeepers.
- I am using standard PageRankComputation class, my vertexID is a long.

I am setting only these parameters:
GiraphConfiguration.SPLIT_MASTER_WORKER.set(giraphConf, false);
GiraphConfiguration.USE_SUPERSTEP_COUNTERS.set(giraphConf, false);
GiraphConfiguration.CHECKPOINT_FREQUENCY.set(giraphConf, 0);

Most of other configurations are set to default value.


View raw message