giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suijian Zhou <suijian.z...@gmail.com>
Subject Re: To process a BIG input graph in giraph.
Date Wed, 05 Mar 2014 20:46:01 GMT
Thanks Claudio, it works! By the way, I want to make clear the logic behind
the graph loading of giraph. I.e: which one is true in the following:

1). The master node loads the entire graph into its memory and then split
and distribute the blocks of the graph to all slave nodes.
2). All master and slaves participate in the graph loading from the
beginning, each node loads and stores its own part of the graph, then
starts to process the graph.

It seems 2) is correct but I found my giraph program just stuck there when
it loaded part of the VERY big graph(~70GB, one single file), I have 40
nodes, each node has 16GB mem on it. It just makes me confused, that's why
I plan to split this big graph input file into multiple smaller files.

  Best Regards,
  Suijian



2014-03-05 11:38 GMT-06:00 Claudio Martella <claudio.martella@gmail.com>:

> -vip /user/hadoop/input should be enough.
>
>
> On Wed, Mar 5, 2014 at 5:31 PM, Suijian Zhou <suijian.zhou@gmail.com>wrote:
>
>> Hi, Experts,
>>   Could anybody remind me how to load mutiple input files in a giraph
>> command line? The following do not work, they only load the first input
>> file:
>> -vip /user/hadoop/input/ttt.txt   /user/hadoop/input/ttt2.txt
>> or
>> -vip /user/hadoop/input/ttt.txt  -vip /user/hadoop/input/ttt2.txt
>>
>>   Best Regards,
>>   Suijian
>>
>>
>>
>>
>> 2014-03-01 16:12 GMT-06:00 Suijian Zhou <suijian.zhou@gmail.com>:
>>
>> Hi,
>>>   Here I'm trying to process a very big input file through giraph,
>>> ~70GB. I'm running the giraph program on a 40 nodes linux cluster but the
>>> program just get stuck there after it read in a small fraction of the input
>>> file. Although each node has 16GB mem, it looks that only one node read the
>>> input file which is on HDFS(into its memory). As the input file is so big,
>>> is there a way to scatter the input file on all the nodes so each node will
>>> read in  a fraction of the file then start processing the graph? Will it be
>>> helpful if we split the single big input file into many smaller files and
>>> let each node read in one of them to process( of course the overall
>>> stucture of the graph should be kept)? Thanks!
>>>
>>>   Best Regards,
>>>   Suijian
>>>
>>>
>>
>
>
> --
>    Claudio Martella
>
>

Mime
View raw message