Mailing-List: contact user-help@giraph.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@giraph.apache.org
Received-SPF: pass (nike.apache.org: domain of suijian.zhou@gmail.com
 designates 209.85.216.180 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAMSmSXJAzXg57cuKCFhrzWissZT=Z07ezr1fB5pY25wgqSN=wg@mail.gmail.com>
References: 
 <CAMSmSXJAzXg57cuKCFhrzWissZT=Z07ezr1fB5pY25wgqSN=wg@mail.gmail.com>
Date: Wed, 5 Mar 2014 10:31:20 -0600
Message-ID: 
 <CAMSmSXK=KgpY3pa3AfXacKsUuDkf8wkEYD5c2vk53EBZ4RXbxg@mail.gmail.com>
Subject: Re: To process a BIG input graph in giraph.
From: Suijian Zhou <suijian.zhou@gmail.com>
To: user@giraph.apache.org, ssc@apache.org
Content-Type: multipart/alternative; boundary=001a1134a76602652604f3de8d52

--001a1134a76602652604f3de8d52
Content-Type: text/plain; charset=ISO-8859-1

Hi, Experts,
  Could anybody remind me how to load mutiple input files in a giraph
command line? The following do not work, they only load the first input
file:
-vip /user/hadoop/input/ttt.txt   /user/hadoop/input/ttt2.txt
or
-vip /user/hadoop/input/ttt.txt  -vip /user/hadoop/input/ttt2.txt

  Best Regards,
  Suijian


2014-03-01 16:12 GMT-06:00 Suijian Zhou <suijian.zhou@gmail.com>:

> Hi,
>   Here I'm trying to process a very big input file through giraph, ~70GB.
> I'm running the giraph program on a 40 nodes linux cluster but the program
> just get stuck there after it read in a small fraction of the input file.
> Although each node has 16GB mem, it looks that only one node read the input
> file which is on HDFS(into its memory). As the input file is so big, is
> there a way to scatter the input file on all the nodes so each node will
> read in  a fraction of the file then start processing the graph? Will it be
> helpful if we split the single big input file into many smaller files and
> let each node read in one of them to process( of course the overall
> stucture of the graph should be kept)? Thanks!
>
>   Best Regards,
>   Suijian
>
>

--001a1134a76602652604f3de8d52
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><div><div><div>Hi, Experts,<br></div>=A0 Could anybod=
y remind me how to load mutiple input files in a giraph command line? The f=
ollowing do not work, they only load the first input file:<br>-vip /user/ha=
doop/input/ttt.txt =A0 /user/hadoop/input/ttt2.txt<br>
</div>or<br>-vip /user/hadoop/input/ttt.txt=A0 -vip /user/hadoop/input/ttt2=
.txt<br><br></div>=A0 Best Regards,<br></div>=A0 Suijian<br><br><br></div><=
div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">2014-03-01 16:=
12 GMT-06:00 Suijian Zhou <span dir=3D"ltr">&lt;<a href=3D"mailto:suijian.z=
hou@gmail.com" target=3D"_blank">suijian.zhou@gmail.com</a>&gt;</span>:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div><div><div>Hi, <br></di=
v>=A0 Here I&#39;m trying to process a very big input file through giraph, =
~70GB. I&#39;m running the giraph program on a 40 nodes linux cluster but t=
he program just get stuck there after it read in a small fraction of the in=
put file. Although each node has 16GB mem, it looks that only one node read=
 the input file which is on HDFS(into its memory). As the input file is so =
big, is there a way to scatter the input file on all the nodes so each node=
 will read in=A0 a fraction of the file then start processing the graph? Wi=
ll it be helpful if we split the single big input file into many smaller fi=
les and let each node read in one of them to process( of course the overall=
 stucture of the graph should be kept)? Thanks!<br>

<br></div>=A0 Best Regards,<br></div>=A0 Suijian<br><br></div>
</blockquote></div><br></div>

--001a1134a76602652604f3de8d52--