Return-Path: X-Original-To: apmail-giraph-user-archive@www.apache.org Delivered-To: apmail-giraph-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 11FD910F91 for ; Fri, 18 Oct 2013 07:04:01 +0000 (UTC) Received: (qmail 96719 invoked by uid 500); 18 Oct 2013 07:03:58 -0000 Delivered-To: apmail-giraph-user-archive@giraph.apache.org Received: (qmail 96631 invoked by uid 500); 18 Oct 2013 07:03:49 -0000 Mailing-List: contact user-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@giraph.apache.org Delivered-To: mailing list user@giraph.apache.org Received: (qmail 96621 invoked by uid 99); 18 Oct 2013 07:03:46 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Oct 2013 07:03:46 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of oujianqiangooy@gmail.com designates 209.85.128.174 as permitted sender) Received: from [209.85.128.174] (HELO mail-ve0-f174.google.com) (209.85.128.174) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Oct 2013 07:03:39 +0000 Received: by mail-ve0-f174.google.com with SMTP id c14so1752423vea.5 for ; Fri, 18 Oct 2013 00:03:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=s4t3BH/fIhlJAzRGQD3eanPZXmEocuBhGrr6FXQwBrc=; b=QF9FqnvpIyy3izfHoZnHggACCV5QcQwPtOTUkr9RaN0pGDs/4McrNfzu90SbjqBRvT uazjKg/jrapkkfDFSlobrQQQ9zSFv/uZa4n3gHvn0SyW8021JuMlC2tEhAjJaZPfzmUY 8t0Bmzhqb77x2iCfA0H0UrzsgJoadncLcppR0dSHhnPkN8APMZz818N3VzHv/KevDvAl 5c3c3ZYK+tD8lVtPM9PHIJcI4kCDDFU2aSB8FNobGNGoJdHiDOdYffvJO3vyKW9CkDA/ xschltG+CoMMzkh92tPPEftBXaGLOpT+1jtWkNA5mUoYlN2GYJ4ZtIF0Uar943bMIvJ3 5QdA== MIME-Version: 1.0 X-Received: by 10.220.78.18 with SMTP id i18mr903874vck.3.1382079798145; Fri, 18 Oct 2013 00:03:18 -0700 (PDT) Received: by 10.220.169.199 with HTTP; Fri, 18 Oct 2013 00:03:18 -0700 (PDT) In-Reply-To: References: Date: Fri, 18 Oct 2013 03:03:18 -0400 Message-ID: Subject: Re: how to use out of core options From: Jianqiang Ou To: user@giraph.apache.org Content-Type: multipart/alternative; boundary=047d7b3a8b026b784204e8fe87ee X-Virus-Checked: Checked by ClamAV on apache.org --047d7b3a8b026b784204e8fe87ee Content-Type: text/plain; charset=ISO-8859-1 Thanks, I just tried another dataset, which could be successfully handled by my cluster within memory. However, exceptions still occurred with the -Dgiraph.useOutOfCoreGraph=true option, but it works fine with only -Dgiraph.useOutOfCoreMessages=true option, so do you still think it is the dir permission issue? By the way, the dir path you mentioned should be the dir to store the outofcore partion and messages in local file system, right? But how do I know where it is? It should be determined by Giraph instead of the applications, right? Thanks for your time and patience again, Jian On Thu, Oct 17, 2013 at 5:32 PM, Jyotirmoy Sundi wrote: > apart from these you might also want to check permissions of the dir path > where offloading of vertices and messages happen. > Ideally giraph is not meant for out-of-core if you graph is much bigger > then the cluster can handle in memory, using giraph defeats the purpose in > this case. > > > > On Thu, Oct 17, 2013 at 8:13 AM, Jianqiang Ou wrote: > >> Thanks very much, so are you saying if I use Dgiraph.maxPartitionsInMemory >> and Dgiraph.maxMessagesInMemory to make them both smaller number, then >> it might work? >> >> Thanks again, >> Jian >> >> >> On Thu, Oct 17, 2013 at 12:56 AM, Jyotirmoy Sundi wrote: >> >>> You need to tune it per your cluster. This is what mentioned in the docs: >>> *"It is difficult to decide a general policy to use out-of-core >>> capabilities*, as it depends on the behavior of the algorithm and the >>> input graph. The exact number of partitions and messages to keep in memory >>> depends on the cluster capabilities, the number of messages produced per >>> superstep, and number of active vertices per superstep. Moreover, it >>> depends on the type and size of vertex values and messages. For example, >>> algorithms such as Belief Propagation tend to keep large vertex values, >>> while algorithms such as clique computations tend to send large messages >>> along. Hence, it depends on your algorithm what feature to rely on more." >>> >>> Thanks >>> Sundi >>> >>> >>> On Wed, Oct 16, 2013 at 9:41 PM, Jianqiang Ou wrote: >>> >>>> Hi Sundi, >>>> >>>> I just tried your method, but somehow the job failed, the attached is >>>> the history of the job. and it was good without the outofcore options. Do >>>> you have any clue why is that? >>>> >>>> The command I used to run the program is below: >>>> >>>> $HADOOP_HOME/bin/hadoop jar >>>> $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-0.20.203.0-jar-with-dependencies.jar >>>> org.apache.giraph.GiraphRunner -Dgiraph.useOutOfCoreMessages=true >>>> -Dgiraph.useOutOfCoreGraph=true >>>> org.apache.giraph.examples.SimplePageRankComputation -vif >>>> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat >>>> -vip /user/andy/input/tiny_graph.txt -vof >>>> org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op >>>> /user/andy/output/page3 -w 3 -mc >>>> org.apache.giraph.examples.SimplePageRankComputation\$SimplePageRankMasterCompute >>>> >>>> Many thanks, >>>> >>>> Jianqiang >>>> >>>> On Wed, Oct 16, 2013 at 12:11 PM, Jianqiang Ou < >>>> oujianqiangooy@gmail.com> wrote: >>>> >>>>> got it, thank you very much! >>>>> >>>>> >>>>> On Wed, Oct 16, 2013 at 10:43 AM, Jyotirmoy Sundi wrote: >>>>> >>>>>> Put it as -Dgiraph.useOutOfCoreMessages=true >>>>>> -Dgiraph.useOutOfCoreGraph=true after GiraphRuuner >>>>>> like >>>>>> hadoop jar girap.jar org.apache.giraph.GiraphRunner -Dgiraph.useOutOfCoreMessages=true >>>>>> -Dgiraph.useOutOfCoreGraph=true ... >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Oct 16, 2013 at 7:29 AM, Jianqiang Ou < >>>>>> oujianqiangooy@gmail.com> wrote: >>>>>> >>>>>>> Hi I have a question about the out of core giraph. It is said that, >>>>>>> in order to use disk to store the partions, we need to use " >>>>>>> giraph.useOutOfCoreGraph=true", but where should I put this >>>>>>> statement to? >>>>>>> >>>>>>> BTW, I am just trying to use the pagerank or shortestpath example to >>>>>>> test the out of core performance of my cluster. >>>>>>> >>>>>>> Thanks very much, >>>>>>> Jian >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Best Regards, >>>>>> Jyotirmoy Sundi >>>>>> Data Engineer, >>>>>> Admobius >>>>>> >>>>>> San Francisco, CA 94158 >>>>>> >>>>> >>>>> >>>> On Wed, Oct 16, 2013 at 12:11 PM, Jianqiang Ou < >>>> oujianqiangooy@gmail.com> wrote: >>>> >>>>> got it, thank you very much! >>>>> >>>>> >>>>> On Wed, Oct 16, 2013 at 10:43 AM, Jyotirmoy Sundi wrote: >>>>> >>>>>> Put it as -Dgiraph.useOutOfCoreMessages=true >>>>>> -Dgiraph.useOutOfCoreGraph=true after GiraphRuuner >>>>>> like >>>>>> hadoop jar girap.jar org.apache.giraph.GiraphRunner -Dgiraph.useOutOfCoreMessages=true >>>>>> -Dgiraph.useOutOfCoreGraph=true ... >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Oct 16, 2013 at 7:29 AM, Jianqiang Ou < >>>>>> oujianqiangooy@gmail.com> wrote: >>>>>> >>>>>>> Hi I have a question about the out of core giraph. It is said that, >>>>>>> in order to use disk to store the partions, we need to use " >>>>>>> giraph.useOutOfCoreGraph=true", but where should I put this >>>>>>> statement to? >>>>>>> >>>>>>> BTW, I am just trying to use the pagerank or shortestpath example to >>>>>>> test the out of core performance of my cluster. >>>>>>> >>>>>>> Thanks very much, >>>>>>> Jian >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Best Regards, >>>>>> Jyotirmoy Sundi >>>>>> Data Engineer, >>>>>> Admobius >>>>>> >>>>>> San Francisco, CA 94158 >>>>>> >>>>> >>>>> >>>> >>> >>> >>> -- >>> Best Regards, >>> Jyotirmoy Sundi >>> Data Engineer, >>> Admobius >>> >>> San Francisco, CA 94158 >>> >> >> > > > -- > Best Regards, > Jyotirmoy Sundi > Data Engineer, > Admobius > > San Francisco, CA 94158 > --047d7b3a8b026b784204e8fe87ee Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Thanks, I just tried another dataset, which could be succe= ssfully handled by my cluster within memory. However, exceptions still occu= rred with the=A0-Dgiraph.useOutOfCoreGraph=3Dtrue option, but it works fine with only=A0= =A0-Dgiraph.useOutOf= CoreMessages=3Dtrue option, so do you still think it is the dir permission = issue?=A0

By the way, the dir path you mentioned should be the= dir to store the outofcore partion and messages in local file system, righ= t? But how do I know where it is? It should be=A0determined=A0by Giraph inst= ead of the applications, right?=A0

Thanks for your time and patience again,
Jian


On Thu, Oct 17, 2013= at 5:32 PM, Jyotirmoy Sundi <sundi133@gmail.com> wrote:
apart from these you might = also want to check permissions of the dir path where offloading of vertices= and messages happen.
Ideally giraph is not meant for out-of-core if you graph is much bigger the= n the cluster can handle in memory, using giraph defeats the purpose in thi= s case.



On Thu, Oct 17, 2013 at 8:1= 3 AM, Jianqiang Ou <oujianqiangooy@gmail.com> wrote:<= br>
Thanks very much, so are yo= u saying if I use Dgiraph.maxPartitionsInMemory and Dgiraph.maxMessagesInMemory to make them both smalle= r number, then it might work?

<= div>Thanks again,<= /span>
Jian


On Thu, Oct 17, 2013 at 12:56 AM, Jyotirmoy = Sundi <sundi133@gmail.com> wrote:
You need to tune it per you= r cluster. This is what mentioned in the docs:
"It is difficult to decide a general pol= icy to use out-of-core capabilities, as it depends on the behavior of the algorithm and t= he input graph. The exact number of partitions and messages to keep in memo= ry depends on the cluster capabilities, the number of messages produced per= superstep, and number of active vertices per superstep. Moreover, it depen= ds on the type and size of vertex values and messages. For example, algorit= hms such as Belief Propagation tend to keep large vertex values, while algo= rithms such as clique computations tend to send large messages along. Hence= , it depends on your algorithm what feature to rely on more."

Tha= nks
Sundi<= /div>


On Wed, Oct 16, 2013 at 9:41 PM, Jianqia= ng Ou <oujianqiangooy@gmail.com> wrote:
Hi Sundi,

I just tr= ied your method, but somehow the job failed, the attached is the history of= the job. and it was good without the outofcore options. Do you have any cl= ue why is that?

The command I used to run the program is below:

$HADOOP_HOME/bin/hadoop jar $GIRAPH_HOME/giraph-examples/target/giraph-= examples-1.1.0-SNAPSHOT-for-hadoop-0.20.203.0-jar-with-dependencies.jar org= .apache.giraph.GiraphRunner -Dgiraph.useOutOfCoreMessages=3Dtrue -Dgiraph.u= seOutOfCoreGraph=3Dtrue org.apache.giraph.examples.SimplePageRankComputatio= n -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFor= mat -vip /user/andy/input/tiny_graph.txt -vof org.apache.giraph.io.formats.= IdWithValueTextOutputFormat -op /user/andy/output/page3 -w 3 -mc org.apache= .giraph.examples.SimplePageRankComputation\$SimplePageRankMasterCompute

Many thanks,

Jianqiang

On Wed, Oct 16, 2013 at 12:11 PM, Jianqiang Ou <oujianqiangooy@gmail.com> wrote:
got it, = thank you very much!


On Wed, Oct 16, 2013 at 10:43 AM, Jyotir= moy Sundi <sundi133@gmail.com> wrote:
Put it a= s=A0-Dgiraph.useOutOfCoreMessages=3Dt= rue -Dgiraph.useOutOfCoreGraph=3Dtrue=A0 after GiraphRuuner
like
hadoop jar girap.jar=A0org.apache.giraph.GiraphRunner -Dgiraph.useOutOfCoreMessages=3Dtrue -Dgiraph.useOutOfCore= Graph=3Dtrue ...


=


On Wed, Oct 16, 2013 at 7:29 AM, Jianqiang Ou <oujianqiangooy@gmail= .com> wrote:
Hi I have a question about the out of core giraph. It is s= aid that, in order to use disk to store the partions, we need to use "= giraph.useOutOfCor= eGraph=3Dtrue", but where should I put this statement to?=A0
BTW, I am just trying to use the pagerank or shortestpath ex= ample to test the out of core performance of my cluster.

Thanks very much,
Jian



<= font color=3D"#888888">--
Best Regards,
Jyotirmoy Sundi
Data Engineer,
Admobius

Sa= n Francisco, CA 94158



On Wed, Oct 16, 2013 at 12:11 P= M, Jianqiang Ou <oujianqiangooy@gmail.com> wrote:
got it, = thank you very much!


On Wed, Oct 16, 2013 at 10:43 AM, Jyotir= moy Sundi <sundi133@gmail.com> wrote:
Put it a= s=A0-Dgiraph.useOutOfCoreMessages=3Dt= rue -Dgiraph.useOutOfCoreGraph=3Dtrue=A0 after GiraphRuuner
like
hadoop jar girap.jar=A0org.apache.giraph.GiraphRunner -Dgiraph.useOutOfCoreMessages=3Dtrue -Dgiraph.useOutOfCore= Graph=3Dtrue ...


=


On Wed, Oct 16, 2013 at 7:29 AM, Jianqiang Ou <oujianqiangooy@gmail= .com> wrote:
Hi I have a question about the out of core giraph. It is s= aid that, in order to use disk to store the partions, we need to use "= giraph.useOutOfCor= eGraph=3Dtrue", but where should I put this statement to?=A0
BTW, I am just trying to use the pagerank or shortestpath ex= ample to test the out of core performance of my cluster.

Thanks very much,
Jian



<= font color=3D"#888888">--
Best Regards,
Jyotirmoy Sundi
Data Engineer,
Admobius

Sa= n Francisco, CA 94158






--
Best Regards,
Jyotirmoy Sundi
Data Engineer,
Admobius

Sa= n Francisco, CA 94158





--
=
Best Regards,
Jyotirmoy Sundi
Data Engineer,
Admobius

Sa= n Francisco, CA 94158


--047d7b3a8b026b784204e8fe87ee--