singa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prasanna Balaprakash <pbala...@mcs.anl.gov>
Subject Re: Training SINGA with a cluster
Date Wed, 13 Jul 2016 04:27:13 GMT
Sheng,

Thanks a lot for your reply! 

> 1. To run singa in cluster,
> you need to set the zookeeper location in "conf/singa.conf" file.
> Just replace "zookeeper_host" field with the zookeeper service you are using.

This is my job script:

#!/bin/sh


cat $COBALT_NODEFILE > /home/pbalapra/Projects/incubator-singa/conf/hostfile

./bin/zk-service.sh start

./bin/singa-run.sh -conf examples/cifar10/job.conf

./bin/zk-service.sh stop


I am starting zookeeper within the job script, which gives the following message. Could you
please let me know how I can specify the location?

JMX enabled by default
Using config: /gpfs/mira-home/pbalapra/Projects/incubator-singa/thirdparty/zookeeper-3.4.6/bin/../conf/zoo.cfg


Thanks a lot
Prasanna


> On Mon, Jul 11, 2016 at 4:04 AM, Prasanna Balaprakash <pbalapra@mcs.anl.gov <mailto:pbalapra@mcs.anl.gov>>
wrote:
> Dear developers,
> 
> I am trying to run SINGA in a cluster environment with ~100 hybrid (CPU+GPU) nodes.
> 
> 
> I started with single node experiment.
> 
> As per the instruction, in my COBALT job script, I use "cat $COBALT_NODEFILE > conf/hostfile”,
where $COBALT_NODEFILE in the COBALT will give the list of nodes allocated.
> 
> I am not sure how to set the zookeeper location!
> 
> Also, how to verify if GPU is used:
> 
> E0710 18:39:23.837704 72213 cluster.cc:50] proc #0 -> localhost:0 (pid = 72213)
> E0710 18:39:23.898723 72241 server.cc:64] Server (group = 0, id = 0) start
> E0710 18:39:24.898967 72242 worker.cc:79] Worker (group = 0, id = 0) start on CPU
> 
> From this log file it seems only CPU is on used.
> 
> Thanks
> Prasanna
> 
> 
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message