giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "YAN Da" <ya...@ust.hk>
Subject Re: How to specify parameters in order to run giraph job in parallel
Date Fri, 18 Oct 2013 17:12:36 GMT
Dear Claudio Martella,

I don't quite get what you mean. Our cluster has 15 servers each with 24
cores, so ideally there can be 15*24 threads/partitions work in parallel,
right? (Perhaps deduct one for ZooKeeper)

However, when we set the "-Dgiraph.numComputeThreads" option, we find that
we cannot have even 20 threads, and when set to 10, the CPU usage is just
a little bit doubles that of the default setting, not anything close to
100*numComputeThreads%.

How can we set it to work on our server to utilize all the processors?

Regards,
Da Yan

> It actually depends on the setup of your cluster.
>
> Ideally, with 15 nodes (tasktrackers) you'd want 1 mapper slot per node
> (ideally to run giraph), so that you would have 14 workers, one per
> computing node, plus one for master+zookeeper. Once that is reached, you
> would have a number of compute threads equals to the number of threads
> that
> you can run on each node (24 in your case).
>
> Does this make sense to you?
>
>
> On Thu, Oct 17, 2013 at 5:04 PM, Yi Lu <luyi0619@gmail.com> wrote:
>
>> Hi,
>>
>> I have a computer cluster consisting of 15 slave machines and 1 master
>> machine.
>>
>> On each slave machine, there are two Xeon E5-2620 CPUs. With the help of
>> HT, there are 24 threads.
>>
>> I am wondering how to specify parameters in order to run giraph job in
>> parallel on my cluster.
>>
>> I am using the following parameters to run a pagerank algorithm.
>>
>> hadoop jar ~/giraph-examples.jar org.apache.giraph.GiraphRunner
>> SimplePageRank -vif PageRankInputFormat -vip /input -vof
>> PageRankOutputFormat -op /pagerank -w 1 -mc
>> SimplePageRank\$SimplePageRankMasterCompute -wc
>> SimplePageRank\$SimplePageRankWorkerContext
>>
>> In particular,
>>
>> 1)I know I can use “-w” to specify the number of workers. In my opinion,
>> the number of workers equals to the number of mappers in hadoop except
>> zookeeper. Therefore, in my case(15 slave machine), which number should
>> be
>> chosen? Is 15 a good choice? Since, I find if I input a large number,
>> e.g.
>> 100, the mappers will hang.
>>
>> 2)I know I can use “-Dgiraph.numComputeThreads=1” to specify vertex
>> computing thread number. However, if I specify it to 10, the total
>> runtime
>> is much longer than default. I think the default is 1, which is found in
>> the source code. I wonder if I want to use this parameter, which number
>> should be chosen.
>>
>> 3)When the giraph job is running, I use “top” command to monitor my cpu
>> usage on slave machines. I find that the java process can use 200%-300%
>> cpu
>> resource. However, if I change the number of vertex computing threads to
>> 10, the java process can use 800% cpu resource. I think it is not a
>> linear
>> relation and I want to know why.
>>
>>
>> Thanks for your help.
>>
>> Best,
>>
>> -Yi
>>
>
>
>
> --
>    Claudio Martella
>    claudio.martella@gmail.com
>



Mime
View raw message