giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Claudio Martella <claudio.marte...@gmail.com>
Subject Re: How to specify parameters in order to run giraph job in parallel
Date Sat, 19 Oct 2013 15:18:31 GMT
how many mapper tasks do you have set for each node? how many workers are
you using for giraph?


On Fri, Oct 18, 2013 at 7:12 PM, YAN Da <yanda@ust.hk> wrote:

> Dear Claudio Martella,
>
> I don't quite get what you mean. Our cluster has 15 servers each with 24
> cores, so ideally there can be 15*24 threads/partitions work in parallel,
> right? (Perhaps deduct one for ZooKeeper)
>
> However, when we set the "-Dgiraph.numComputeThreads" option, we find that
> we cannot have even 20 threads, and when set to 10, the CPU usage is just
> a little bit doubles that of the default setting, not anything close to
> 100*numComputeThreads%.
>
> How can we set it to work on our server to utilize all the processors?
>
> Regards,
> Da Yan
>
> > It actually depends on the setup of your cluster.
> >
> > Ideally, with 15 nodes (tasktrackers) you'd want 1 mapper slot per node
> > (ideally to run giraph), so that you would have 14 workers, one per
> > computing node, plus one for master+zookeeper. Once that is reached, you
> > would have a number of compute threads equals to the number of threads
> > that
> > you can run on each node (24 in your case).
> >
> > Does this make sense to you?
> >
> >
> > On Thu, Oct 17, 2013 at 5:04 PM, Yi Lu <luyi0619@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> I have a computer cluster consisting of 15 slave machines and 1 master
> >> machine.
> >>
> >> On each slave machine, there are two Xeon E5-2620 CPUs. With the help of
> >> HT, there are 24 threads.
> >>
> >> I am wondering how to specify parameters in order to run giraph job in
> >> parallel on my cluster.
> >>
> >> I am using the following parameters to run a pagerank algorithm.
> >>
> >> hadoop jar ~/giraph-examples.jar org.apache.giraph.GiraphRunner
> >> SimplePageRank -vif PageRankInputFormat -vip /input -vof
> >> PageRankOutputFormat -op /pagerank -w 1 -mc
> >> SimplePageRank\$SimplePageRankMasterCompute -wc
> >> SimplePageRank\$SimplePageRankWorkerContext
> >>
> >> In particular,
> >>
> >> 1)I know I can use “-w” to specify the number of workers. In my opinion,
> >> the number of workers equals to the number of mappers in hadoop except
> >> zookeeper. Therefore, in my case(15 slave machine), which number should
> >> be
> >> chosen? Is 15 a good choice? Since, I find if I input a large number,
> >> e.g.
> >> 100, the mappers will hang.
> >>
> >> 2)I know I can use “-Dgiraph.numComputeThreads=1” to specify vertex
> >> computing thread number. However, if I specify it to 10, the total
> >> runtime
> >> is much longer than default. I think the default is 1, which is found in
> >> the source code. I wonder if I want to use this parameter, which number
> >> should be chosen.
> >>
> >> 3)When the giraph job is running, I use “top” command to monitor my cpu
> >> usage on slave machines. I find that the java process can use 200%-300%
> >> cpu
> >> resource. However, if I change the number of vertex computing threads to
> >> 10, the java process can use 800% cpu resource. I think it is not a
> >> linear
> >> relation and I want to know why.
> >>
> >>
> >> Thanks for your help.
> >>
> >> Best,
> >>
> >> -Yi
> >>
> >
> >
> >
> > --
> >    Claudio Martella
> >    claudio.martella@gmail.com
> >
>
>
>


-- 
   Claudio Martella
   claudio.martella@gmail.com

Mime
View raw message