giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Claudio Martella <claudio.marte...@gmail.com>
Subject Re: To process a BIG input graph in giraph.
Date Wed, 05 Mar 2014 17:38:21 GMT
-vip /user/hadoop/input should be enough.


On Wed, Mar 5, 2014 at 5:31 PM, Suijian Zhou <suijian.zhou@gmail.com> wrote:

> Hi, Experts,
>   Could anybody remind me how to load mutiple input files in a giraph
> command line? The following do not work, they only load the first input
> file:
> -vip /user/hadoop/input/ttt.txt   /user/hadoop/input/ttt2.txt
> or
> -vip /user/hadoop/input/ttt.txt  -vip /user/hadoop/input/ttt2.txt
>
>   Best Regards,
>   Suijian
>
>
>
>
> 2014-03-01 16:12 GMT-06:00 Suijian Zhou <suijian.zhou@gmail.com>:
>
> Hi,
>>   Here I'm trying to process a very big input file through giraph, ~70GB.
>> I'm running the giraph program on a 40 nodes linux cluster but the program
>> just get stuck there after it read in a small fraction of the input file.
>> Although each node has 16GB mem, it looks that only one node read the input
>> file which is on HDFS(into its memory). As the input file is so big, is
>> there a way to scatter the input file on all the nodes so each node will
>> read in  a fraction of the file then start processing the graph? Will it be
>> helpful if we split the single big input file into many smaller files and
>> let each node read in one of them to process( of course the overall
>> stucture of the graph should be kept)? Thanks!
>>
>>   Best Regards,
>>   Suijian
>>
>>
>


-- 
   Claudio Martella

Mime
View raw message