giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suijian Zhou <suijian.z...@gmail.com>
Subject To process a BIG input graph in giraph.
Date Sat, 01 Mar 2014 22:12:11 GMT
Hi,
  Here I'm trying to process a very big input file through giraph, ~70GB.
I'm running the giraph program on a 40 nodes linux cluster but the program
just get stuck there after it read in a small fraction of the input file.
Although each node has 16GB mem, it looks that only one node read the input
file which is on HDFS(into its memory). As the input file is so big, is
there a way to scatter the input file on all the nodes so each node will
read in  a fraction of the file then start processing the graph? Will it be
helpful if we split the single big input file into many smaller files and
let each node read in one of them to process( of course the overall
stucture of the graph should be kept)? Thanks!

  Best Regards,
  Suijian

Mime
View raw message