singa-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From build...@apache.org
Subject svn commit: r959251 - in /websites/staging/singa/trunk/content: ./ quick-start.html
Date Thu, 23 Jul 2015 05:53:31 GMT
Author: buildbot
Date: Thu Jul 23 05:53:30 2015
New Revision: 959251

Log:
Staging update by buildbot for singa

Modified:
    websites/staging/singa/trunk/content/   (props changed)
    websites/staging/singa/trunk/content/quick-start.html

Propchange: websites/staging/singa/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Thu Jul 23 05:53:30 2015
@@ -1 +1 @@
-1692347
+1692349

Modified: websites/staging/singa/trunk/content/quick-start.html
==============================================================================
--- websites/staging/singa/trunk/content/quick-start.html (original)
+++ websites/staging/singa/trunk/content/quick-start.html Thu Jul 23 05:53:30 2015
@@ -497,18 +497,18 @@ training data shard, test data shard and
 cd ../..
 ./bin/singa-run.sh -workspace=examples/cifar10
 </pre></div></div>
-<p>Note: we have changed the command line arguments from <tt>-cluster... -model=...</tt>
to <tt>-workspace</tt>. The <tt>workspace</tt> folder must have a
job.conf file which specifies the cluster (number of workers, number of servers, etc) and
model configuration.</p>
+<p>Note: we have changed the command line arguments from <tt>-cluster.. -model..</tt>
to <tt>-workspace</tt>. The <tt>workspace</tt> folder must have a
job.conf file which specifies the cluster (number of workers, number of servers, etc) and
model configuration.</p>
 <p>Some training information will be shown on the screen like:</p>
 
 <div class="source">
 <div class="source"><pre class="prettyprint">Starting zookeeper ... already running
as process 21660.
-Generate host list to /home/singa/wangwei/incubator-singa/examples/cifar10/job.hosts
-Generate job id to /home/singa/wangwei/incubator-singa/examples/cifar10/job.id [job_id =
1]
-Executing : ./singa -workspace=/home/singa/wangwei/incubator-singa/examples/cifar10 -job=1
+Generate host list to SINGA_ROOT/examples/cifar10/job.hosts
+Generate job id to SINGA_ROOT/examples/cifar10/job.id [job_id = 1]
+Executing : ./singa -workspace=SINGA_ROOT/examples/cifar10 -job=1
 proc #0 -&gt; 10.10.10.14:49152 (pid = 26724)
 Server (group = 0, id = 0) start
 Worker (group = 0, id = 0) start
-Generate pid list to /home/singa/wangwei/incubator-singa/examples/cifar10/job.pids
+Generate pid list to SINGA_ROOT/examples/cifar10/job.pids
 Test step-0, loss : 2.302607, accuracy : 0.090100
 Train step-0, loss : 2.302614, accuracy : 0.062500
 Train step-30, loss : 2.302403, accuracy : 0.141129
@@ -524,6 +524,12 @@ Test step-300, loss : 2.256824, accuracy
 Train step-300, loss : 2.292490, accuracy : 0.165282
 </pre></div></div>
 <p>You can find more logs under the <tt>/tmp</tt> folder. Once the training
is finished the learned model parameters will be dumped into $workspace/checkpoint folder.
The dumped file can be used for continuing the training or as initialization for other similar
models. <a href="checkpoint.html">Checkpoint and Resume</a> discusses more details.</p>
+<p>The job can be stopped by</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">./bin/singa-stop.sh
+</pre></div></div>
+<p>It will kill all singa processes.</p>
 <!-- -
 To train the model without any partitioning, you just set the numbers
 in the cluster configuration file (*cluster.conf*) as :
@@ -561,15 +567,15 @@ cluster {
   
 <div class="source">
 <div class="source"><pre class="prettyprint">// hostfile
-logbase-a04
-logbase-a05
-logbase-a06
+singa-node1
+singa-node2
+singa-node3
 ...
 </pre></div></div></li>
   
 <li>
 <p>The zookeeper location must be configured in conf/singa.conf, e.g.,</p>
-<p>zookeeper_host: &#x201c;logbase-a04:2181&#x201d;</p></li>
+<p>zookeeper_host: &#x201c;singa-node1:2181&#x201d;</p></li>
   
 <li>
 <p>Make your ssh command password-free</p></li>
@@ -580,28 +586,19 @@ logbase-a06
 <div class="source"><pre class="prettyprint">./bin/singa-run.sh -workspace=examples/cifar10
 </pre></div></div>
 <p>The <tt>singa-run.sh</tt> will calculate the number of nodes (i.e.,
processes) to launch and will generate a job.hosts file under workspace by looping through
all nodes in conf/hostfile. Hence if there are few nodes in the hostfile, then multiple processes
would be launched in one node.</p>
-<p>You can get some job information like job ID and running processes using the singa-console.sh
script:</p>
-
-<div class="source">
-<div class="source"><pre class="prettyprint">./bin/singa-console.sh list
-JOB ID    |NUM PROCS
-----------|-----------
-job-4     |2
-</pre></div></div>
 <p>Sample training output is</p>
 
 <div class="source">
-<div class="source"><pre class="prettyprint">Generate job id to /home/singa/wangwei/incubator-singa/examples/cifar10/job.id
[job_id = 4]
-Executing @ logbase-a04 : cd /home/singa/wangwei/incubator-singa; ./singa -workspace=/home/singa/wangwei/incubator-singa/examples/cifar10
-job=4
-Executing @ logbase-a05 : cd /home/singa/wangwei/incubator-singa; ./singa -workspace=/home/singa/wangwei/incubator-singa/examples/cifar10
-job=4
+<div class="source"><pre class="prettyprint">Generate job id to SINGA_ROOT/examples/cifar10/job.id
[job_id = 4]
+Executing @ singa-node1: cd SINGA_ROOT; ./singa -workspace=SINGA_ROOT/examples/cifar10 -job=4
+Executing @ singa-node2: cd SINGA_ROOT; ./singa -workspace=SINGA_ROOT/examples/cifar10 -job=4
 proc #0 -&gt; 10.10.10.15:49152 (pid = 3504)
 proc #1 -&gt; 10.10.10.14:49152 (pid = 27119)
 Server (group = 0, id = 1) start
 Worker (group = 1, id = 0) start
 Server (group = 0, id = 0) start
 Worker (group = 0, id = 0) start
-Generate pid list to
-/home/singa/wangwei/incubator-singa/examples/cifar10/job.pids
+Generate pid list to SINGA_ROOT/examples/cifar10/job.pids
 Test step-0, loss : 2.297355, accuracy : 0.101700
 Train step-0, loss : 2.274724, accuracy : 0.062500
 Train step-30, loss : 2.263850, accuracy : 0.131048
@@ -617,16 +614,18 @@ Test step-300, loss : 1.921962, accuracy
 Train step-300, loss : 2.129271, accuracy : 0.208056
 </pre></div></div>
 <p>We can see that the accuracy (resp. loss) from distributed training increases (resp.
decreases) faster than that for the single node training.</p>
-<p>You can stop the training by singa-stop.sh</p>
+<p>You can get some job information like job ID and running processes using the singa-console.sh
script:</p>
 
 <div class="source">
-<div class="source"><pre class="prettyprint">./bin/singa-stop.sh
-Kill singa @ logbase-a04 ...
-Kill singa @ logbase-a05 ...
-bash: line 1: 27119 Killed                  ./singa -workspace=/home/singa/wangwei/incubator-singa/examples/cifar10
-job=4
-Kill singa @ logbase-a06 ...
-bash: line 1:  3504 Killed                  ./singa -workspace=/home/singa/wangwei/incubator-singa/examples/cifar10
-job=4
-Cleanning metadata in zookeeper ...
+<div class="source"><pre class="prettyprint">./bin/singa-console.sh list
+JOB ID    |NUM PROCS
+----------|-----------
+job-4     |2
+</pre></div></div>
+<p>To kill the job, just run</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">./bin/singa-console.sh kill job-4
 </pre></div></div>
 <!-- -
 In other words,



Mime
View raw message