giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Claudio Martella <claudio.marte...@gmail.com>
Subject Re: How to specify parameters in order to run giraph job in parallel
Date Thu, 17 Oct 2013 15:26:39 GMT
It actually depends on the setup of your cluster.

Ideally, with 15 nodes (tasktrackers) you'd want 1 mapper slot per node
(ideally to run giraph), so that you would have 14 workers, one per
computing node, plus one for master+zookeeper. Once that is reached, you
would have a number of compute threads equals to the number of threads that
you can run on each node (24 in your case).

Does this make sense to you?


On Thu, Oct 17, 2013 at 5:04 PM, Yi Lu <luyi0619@gmail.com> wrote:

> Hi,
>
> I have a computer cluster consisting of 15 slave machines and 1 master
> machine.
>
> On each slave machine, there are two Xeon E5-2620 CPUs. With the help of
> HT, there are 24 threads.
>
> I am wondering how to specify parameters in order to run giraph job in
> parallel on my cluster.
>
> I am using the following parameters to run a pagerank algorithm.
>
> hadoop jar ~/giraph-examples.jar org.apache.giraph.GiraphRunner
> SimplePageRank -vif PageRankInputFormat -vip /input -vof
> PageRankOutputFormat -op /pagerank -w 1 -mc
> SimplePageRank\$SimplePageRankMasterCompute -wc
> SimplePageRank\$SimplePageRankWorkerContext
>
> In particular,
>
> 1)I know I can use “-w” to specify the number of workers. In my opinion,
> the number of workers equals to the number of mappers in hadoop except
> zookeeper. Therefore, in my case(15 slave machine), which number should be
> chosen? Is 15 a good choice? Since, I find if I input a large number, e.g.
> 100, the mappers will hang.
>
> 2)I know I can use “-Dgiraph.numComputeThreads=1” to specify vertex
> computing thread number. However, if I specify it to 10, the total runtime
> is much longer than default. I think the default is 1, which is found in
> the source code. I wonder if I want to use this parameter, which number
> should be chosen.
>
> 3)When the giraph job is running, I use “top” command to monitor my cpu
> usage on slave machines. I find that the java process can use 200%-300% cpu
> resource. However, if I change the number of vertex computing threads to
> 10, the java process can use 800% cpu resource. I think it is not a linear
> relation and I want to know why.
>
>
> Thanks for your help.
>
> Best,
>
> -Yi
>



-- 
   Claudio Martella
   claudio.martella@gmail.com

Mime
View raw message