Return-Path: X-Original-To: apmail-giraph-user-archive@www.apache.org Delivered-To: apmail-giraph-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5D419100A4 for ; Wed, 5 Mar 2014 16:31:59 +0000 (UTC) Received: (qmail 74257 invoked by uid 500); 5 Mar 2014 16:31:58 -0000 Delivered-To: apmail-giraph-user-archive@giraph.apache.org Received: (qmail 74009 invoked by uid 500); 5 Mar 2014 16:31:49 -0000 Mailing-List: contact user-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@giraph.apache.org Delivered-To: mailing list user@giraph.apache.org Received: (qmail 73997 invoked by uid 99); 5 Mar 2014 16:31:48 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Mar 2014 16:31:48 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of suijian.zhou@gmail.com designates 209.85.216.180 as permitted sender) Received: from [209.85.216.180] (HELO mail-qc0-f180.google.com) (209.85.216.180) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Mar 2014 16:31:41 +0000 Received: by mail-qc0-f180.google.com with SMTP id x3so1386264qcv.39 for ; Wed, 05 Mar 2014 08:31:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=HmPGbf2news0XmNb1+OKbXbigUx8ovfTfg+Kh2U0z2A=; b=v3FK8EJj5EEClS+SseijARC7dqbqOXm6xCW6kxpPAc7168nt9nVO71QB8JYt13XHrY 22JzIQzfNXwWcoItvEyvWSJvV37UOV/f4foPXiqznlixYSPLXsdIKwu3emGlp4ddGw4s bYprbDnxY7kWZL5PKiYxUI0elqCU/U27rYEs4NtVVRi6NG4MeMzZCp/RVE8D6aT+ItYL SuNJr1QXprZzzKvPKYVUrn0VcYzc//VvNjY1JDpPUe0HUY23xDf8IpYQV9G6oJrRs/4y fUmH92P2j/xnCkKMLj4Aovkt9aD3TxRbyhpm0IRNXp7+JT++KioCoO7KCHi7+HRc4aHw 2yNQ== MIME-Version: 1.0 X-Received: by 10.224.38.209 with SMTP id c17mr5031377qae.11.1394037080867; Wed, 05 Mar 2014 08:31:20 -0800 (PST) Received: by 10.96.93.65 with HTTP; Wed, 5 Mar 2014 08:31:20 -0800 (PST) In-Reply-To: References: Date: Wed, 5 Mar 2014 10:31:20 -0600 Message-ID: Subject: Re: To process a BIG input graph in giraph. From: Suijian Zhou To: user@giraph.apache.org, ssc@apache.org Content-Type: multipart/alternative; boundary=001a1134a76602652604f3de8d52 X-Virus-Checked: Checked by ClamAV on apache.org --001a1134a76602652604f3de8d52 Content-Type: text/plain; charset=ISO-8859-1 Hi, Experts, Could anybody remind me how to load mutiple input files in a giraph command line? The following do not work, they only load the first input file: -vip /user/hadoop/input/ttt.txt /user/hadoop/input/ttt2.txt or -vip /user/hadoop/input/ttt.txt -vip /user/hadoop/input/ttt2.txt Best Regards, Suijian 2014-03-01 16:12 GMT-06:00 Suijian Zhou : > Hi, > Here I'm trying to process a very big input file through giraph, ~70GB. > I'm running the giraph program on a 40 nodes linux cluster but the program > just get stuck there after it read in a small fraction of the input file. > Although each node has 16GB mem, it looks that only one node read the input > file which is on HDFS(into its memory). As the input file is so big, is > there a way to scatter the input file on all the nodes so each node will > read in a fraction of the file then start processing the graph? Will it be > helpful if we split the single big input file into many smaller files and > let each node read in one of them to process( of course the overall > stucture of the graph should be kept)? Thanks! > > Best Regards, > Suijian > > --001a1134a76602652604f3de8d52 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi, Experts,
=A0 Could anybod= y remind me how to load mutiple input files in a giraph command line? The f= ollowing do not work, they only load the first input file:
-vip /user/ha= doop/input/ttt.txt =A0 /user/hadoop/input/ttt2.txt
or
-vip /user/hadoop/input/ttt.txt=A0 -vip /user/hadoop/input/ttt2= .txt

=A0 Best Regards,
=A0 Suijian


<= div class=3D"gmail_extra">

2014-03-01 16:= 12 GMT-06:00 Suijian Zhou <suijian.zhou@gmail.com>:
Hi,
=A0 Here I'm trying to process a very big input file through giraph, = ~70GB. I'm running the giraph program on a 40 nodes linux cluster but t= he program just get stuck there after it read in a small fraction of the in= put file. Although each node has 16GB mem, it looks that only one node read= the input file which is on HDFS(into its memory). As the input file is so = big, is there a way to scatter the input file on all the nodes so each node= will read in=A0 a fraction of the file then start processing the graph? Wi= ll it be helpful if we split the single big input file into many smaller fi= les and let each node read in one of them to process( of course the overall= stucture of the graph should be kept)? Thanks!

=A0 Best Regards,
=A0 Suijian


--001a1134a76602652604f3de8d52--