singa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From WANG Sheng <wan...@comp.nus.edu.sg>
Subject Re: Training SINGA with a cluster
Date Mon, 11 Jul 2016 10:56:38 GMT
Hi Prasanna,

1. To run singa in cluster,
you need to set the zookeeper location in "conf/singa.conf" file.
Just replace "zookeeper_host" field with the zookeeper service you are
using.

2. To run on a GPU,
please make sure that job configure file on all GPU nodes has following
field:
gpu: <gpu id>
If you need to use multiple GPUs in a single node, please add all of them
in the configure file, e.g. :
gpu: 0
gpu: 1
...

When it is running on GPU, you will see following info from log files:
Worker (group = XXX, id = XXX) start on GPU XXX

Regards,
Sheng





On Mon, Jul 11, 2016 at 4:04 AM, Prasanna Balaprakash <pbalapra@mcs.anl.gov>
wrote:

> Dear developers,
>
> I am trying to run SINGA in a cluster environment with ~100 hybrid
> (CPU+GPU) nodes.
>
>
> I started with single node experiment.
>
> As per the instruction, in my COBALT job script, I use "cat
> $COBALT_NODEFILE > conf/hostfile”, where $COBALT_NODEFILE in the COBALT
> will give the list of nodes allocated.
>
> I am not sure how to set the zookeeper location!
>
> Also, how to verify if GPU is used:
>
> E0710 18:39:23.837704 72213 cluster.cc:50] proc #0 -> localhost:0 (pid =
> 72213)
> E0710 18:39:23.898723 72241 server.cc:64] Server (group = 0, id = 0) start
> E0710 18:39:24.898967 72242 worker.cc:79] Worker (group = 0, id = 0) start
> on CPU
>
> From this log file it seems only CPU is on used.
>
> Thanks
> Prasanna
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message