giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jianqiang Ou <oujianqiang...@gmail.com>
Subject Re: how to use out of core options
Date Sat, 19 Oct 2013 19:51:51 GMT
 Hi Claudio,

The version of hadoop should be 0.20.203.0, but I am not quite sure about
the version of Giraph, I got it from:

git clone https://github.com/apache/giraph.git

and the command I used is something like the one below, but I might also
used the giraph.maxPartitionsInMemory=1 option at that time too, but with
or without this option, it did not work.

$HADOOP_HOME/bin/hadoop jar
$GIRAPH_HOME/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-0.20.203.0-jar-with-dependencies.jar
org.apache.giraph.GiraphRunner -Dgiraph.useOutOfCoreMessages=true
-Dgiraph.useOutOfCoreGraph=true
org.apache.giraph.examples.SimplePageRankComputation
-vif
org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
-vip /user/andy/input/tiny_graph.txt -vof
org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
/user/andy/output/page6 -w 3 -mc
org.apache.giraph.examples.SimplePageRankComputation\$SimplePageRankMasterCompute

Thanks,
Jian



On Sat, Oct 19, 2013 at 11:21 AM, Claudio Martella <
claudio.martella@gmail.com> wrote:

> looking at your logs, there's a null pointer exception. looks like a bug
> to me. what version are you running? what command are you using to run the
> job?
>
>
> On Fri, Oct 18, 2013 at 9:03 AM, Jianqiang Ou <oujianqiangooy@gmail.com>wrote:
>
>> Thanks, I just tried another dataset, which could be successfully handled
>> by my cluster within memory. However, exceptions still occurred with the
>> -Dgiraph.useOutOfCoreGraph=true option, but it works fine with only  -Dgiraph.useOutOfCoreMessages=true
>> option, so do you still think it is the dir permission issue?
>>
>> By the way, the dir path you mentioned should be the dir to store the
>> outofcore partion and messages in local file system, right? But how do I
>> know where it is? It should be determined by Giraph instead of the
>> applications, right?
>>
>> Thanks for your time and patience again,
>> Jian
>>
>>
>> On Thu, Oct 17, 2013 at 5:32 PM, Jyotirmoy Sundi <sundi133@gmail.com>wrote:
>>
>>> apart from these you might also want to check permissions of the dir
>>> path where offloading of vertices and messages happen.
>>> Ideally giraph is not meant for out-of-core if you graph is much bigger
>>> then the cluster can handle in memory, using giraph defeats the purpose in
>>> this case.
>>>
>>>
>>>
>>> On Thu, Oct 17, 2013 at 8:13 AM, Jianqiang Ou <oujianqiangooy@gmail.com>wrote:
>>>
>>>> Thanks very much, so are you saying if I use Dgiraph.maxPartitionsInMemory
>>>> and Dgiraph.maxMessagesInMemory to make them both smaller number, then
>>>> it might work?
>>>>
>>>> Thanks again,
>>>> Jian
>>>>
>>>>
>>>> On Thu, Oct 17, 2013 at 12:56 AM, Jyotirmoy Sundi <sundi133@gmail.com>wrote:
>>>>
>>>>> You need to tune it per your cluster. This is what mentioned in the
>>>>> docs:
>>>>> *"It is difficult to decide a general policy to use out-of-core
>>>>> capabilities*, as it depends on the behavior of the algorithm and the
>>>>> input graph. The exact number of partitions and messages to keep in memory
>>>>> depends on the cluster capabilities, the number of messages produced
per
>>>>> superstep, and number of active vertices per superstep. Moreover, it
>>>>> depends on the type and size of vertex values and messages. For example,
>>>>> algorithms such as Belief Propagation tend to keep large vertex values,
>>>>> while algorithms such as clique computations tend to send large messages
>>>>> along. Hence, it depends on your algorithm what feature to rely on more."
>>>>>
>>>>> Thanks
>>>>>  Sundi
>>>>>
>>>>>
>>>>> On Wed, Oct 16, 2013 at 9:41 PM, Jianqiang Ou <
>>>>> oujianqiangooy@gmail.com> wrote:
>>>>>
>>>>>> Hi Sundi,
>>>>>>
>>>>>> I just tried your method, but somehow the job failed, the attached
is
>>>>>> the history of the job. and it was good without the outofcore options.
Do
>>>>>> you have any clue why is that?
>>>>>>
>>>>>> The command I used to run the program is below:
>>>>>>
>>>>>> $HADOOP_HOME/bin/hadoop jar
>>>>>> $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-
>>>>>> 0.20.203.0-jar-with-dependencies.jar org.apache.giraph.GiraphRunner
>>>>>> -Dgiraph.useOutOfCoreMessages=true -Dgiraph.useOutOfCoreGraph=true
>>>>>> org.apache.giraph.examples.SimplePageRankComputation -vif
>>>>>> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
>>>>>> -vip /user/andy/input/tiny_graph.txt -vof
>>>>>> org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
>>>>>> /user/andy/output/page3 -w 3 -mc
>>>>>> org.apache.giraph.examples.SimplePageRankComputation\$SimplePageRankMasterCompute
>>>>>>
>>>>>> Many thanks,
>>>>>>
>>>>>> Jianqiang
>>>>>>
>>>>>> On Wed, Oct 16, 2013 at 12:11 PM, Jianqiang Ou <
>>>>>> oujianqiangooy@gmail.com> wrote:
>>>>>>
>>>>>>> got it, thank you very much!
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 16, 2013 at 10:43 AM, Jyotirmoy Sundi <
>>>>>>> sundi133@gmail.com> wrote:
>>>>>>>
>>>>>>>> Put it as -Dgiraph.useOutOfCoreMessages=true
>>>>>>>> -Dgiraph.useOutOfCoreGraph=true  after GiraphRuuner
>>>>>>>> like
>>>>>>>> hadoop jar girap.jar org.apache.giraph.GiraphRunner -Dgiraph.useOutOfCoreMessages=true
>>>>>>>> -Dgiraph.useOutOfCoreGraph=true ...
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Oct 16, 2013 at 7:29 AM, Jianqiang Ou <
>>>>>>>> oujianqiangooy@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi I have a question about the out of core giraph. It
is said
>>>>>>>>> that, in order to use disk to store the partions, we
need to use "
>>>>>>>>> giraph.useOutOfCoreGraph=true", but where should I put
this
>>>>>>>>> statement to?
>>>>>>>>>
>>>>>>>>> BTW, I am just trying to use the pagerank or shortestpath
example
>>>>>>>>> to test the out of core performance of my cluster.
>>>>>>>>>
>>>>>>>>> Thanks very much,
>>>>>>>>> Jian
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Regards,
>>>>>>>> Jyotirmoy Sundi
>>>>>>>> Data Engineer,
>>>>>>>> Admobius
>>>>>>>>
>>>>>>>> San Francisco, CA 94158
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> On Wed, Oct 16, 2013 at 12:11 PM, Jianqiang Ou <
>>>>>> oujianqiangooy@gmail.com> wrote:
>>>>>>
>>>>>>> got it, thank you very much!
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 16, 2013 at 10:43 AM, Jyotirmoy Sundi <
>>>>>>> sundi133@gmail.com> wrote:
>>>>>>>
>>>>>>>> Put it as -Dgiraph.useOutOfCoreMessages=true
>>>>>>>> -Dgiraph.useOutOfCoreGraph=true  after GiraphRuuner
>>>>>>>> like
>>>>>>>> hadoop jar girap.jar org.apache.giraph.GiraphRunner -Dgiraph.useOutOfCoreMessages=true
>>>>>>>> -Dgiraph.useOutOfCoreGraph=true ...
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Oct 16, 2013 at 7:29 AM, Jianqiang Ou <
>>>>>>>> oujianqiangooy@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi I have a question about the out of core giraph. It
is said
>>>>>>>>> that, in order to use disk to store the partions, we
need to use "
>>>>>>>>> giraph.useOutOfCoreGraph=true", but where should I put
this
>>>>>>>>> statement to?
>>>>>>>>>
>>>>>>>>> BTW, I am just trying to use the pagerank or shortestpath
example
>>>>>>>>> to test the out of core performance of my cluster.
>>>>>>>>>
>>>>>>>>> Thanks very much,
>>>>>>>>> Jian
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Regards,
>>>>>>>> Jyotirmoy Sundi
>>>>>>>> Data Engineer,
>>>>>>>> Admobius
>>>>>>>>
>>>>>>>> San Francisco, CA 94158
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Jyotirmoy Sundi
>>>>> Data Engineer,
>>>>> Admobius
>>>>>
>>>>> San Francisco, CA 94158
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Jyotirmoy Sundi
>>> Data Engineer,
>>> Admobius
>>>
>>> San Francisco, CA 94158
>>>
>>
>>
>
>
> --
>    Claudio Martella
>    claudio.martella@gmail.com
>

Mime
View raw message