singa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From WANG Sheng <wan...@comp.nus.edu.sg>
Subject Re: Training SINGA with a cluster
Date Wed, 13 Jul 2016 10:08:05 GMT
Do you mean the location of zookeeper config file?

Actually, that config path is managed by zk itself, and singa does not need
to read that one.

If you have special needs for zk config,
I recommend you to copy your zk config file to zk folder, instead of
specify another location.

The launch/stop of zookeeper is actually independent to singa, i.e.,
zk-service.sh just calls $ZK_HOME/bin/zkServer.sh to launch zk.
You can directly start/stop your zk using scripts provided in $ZK_HOME/bin.

This zk can be shared with other applications. Singa only needs to know the
endpoint, i.e., write your zookeeper_host to "conf/singa.conf" file.

Regards,
Sheng


On Wed, Jul 13, 2016 at 12:27 PM, Prasanna Balaprakash <pbalapra@mcs.anl.gov
> wrote:

> Sheng,
>
> Thanks a lot for your reply!
>
> > 1. To run singa in cluster,
> > you need to set the zookeeper location in "conf/singa.conf" file.
> > Just replace "zookeeper_host" field with the zookeeper service you are
> using.
>
> This is my job script:
>
> #!/bin/sh
>
>
> cat $COBALT_NODEFILE >
> /home/pbalapra/Projects/incubator-singa/conf/hostfile
>
> ./bin/zk-service.sh start
>
> ./bin/singa-run.sh -conf examples/cifar10/job.conf
>
> ./bin/zk-service.sh stop
>
>
> I am starting zookeeper within the job script, which gives the following
> message. Could you please let me know how I can specify the location?
>
> JMX enabled by default
> Using config:
> /gpfs/mira-home/pbalapra/Projects/incubator-singa/thirdparty/zookeeper-3.4.6/bin/../conf/zoo.cfg
>
>
> Thanks a lot
> Prasanna
>
>
> > On Mon, Jul 11, 2016 at 4:04 AM, Prasanna Balaprakash <
> pbalapra@mcs.anl.gov <mailto:pbalapra@mcs.anl.gov>> wrote:
> > Dear developers,
> >
> > I am trying to run SINGA in a cluster environment with ~100 hybrid
> (CPU+GPU) nodes.
> >
> >
> > I started with single node experiment.
> >
> > As per the instruction, in my COBALT job script, I use "cat
> $COBALT_NODEFILE > conf/hostfile”, where $COBALT_NODEFILE in the COBALT
> will give the list of nodes allocated.
> >
> > I am not sure how to set the zookeeper location!
> >
> > Also, how to verify if GPU is used:
> >
> > E0710 18:39:23.837704 72213 cluster.cc:50] proc #0 -> localhost:0 (pid =
> 72213)
> > E0710 18:39:23.898723 72241 server.cc:64] Server (group = 0, id = 0)
> start
> > E0710 18:39:24.898967 72242 worker.cc:79] Worker (group = 0, id = 0)
> start on CPU
> >
> > From this log file it seems only CPU is on used.
> >
> > Thanks
> > Prasanna
> >
> >
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message