singa-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From build...@apache.org
Subject svn commit: r964001 - in /websites/staging/singa/trunk/content: ./ community/ develop/ docs/
Date Wed, 02 Sep 2015 08:15:16 GMT
Author: buildbot
Date: Wed Sep  2 08:15:15 2015
New Revision: 964001

Log:
Staging update by buildbot for singa

Added:
    websites/staging/singa/trunk/content/docs/data.html
Modified:
    websites/staging/singa/trunk/content/   (props changed)
    websites/staging/singa/trunk/content/community.html
    websites/staging/singa/trunk/content/community/issue-tracking.html
    websites/staging/singa/trunk/content/community/mail-lists.html
    websites/staging/singa/trunk/content/community/source-repository.html
    websites/staging/singa/trunk/content/community/team-list.html
    websites/staging/singa/trunk/content/develop/contribute-code.html
    websites/staging/singa/trunk/content/develop/contribute-docs.html
    websites/staging/singa/trunk/content/develop/how-contribute.html
    websites/staging/singa/trunk/content/develop/schedule.html
    websites/staging/singa/trunk/content/docs.html
    websites/staging/singa/trunk/content/docs/architecture.html
    websites/staging/singa/trunk/content/docs/checkpoint.html
    websites/staging/singa/trunk/content/docs/cnn.html
    websites/staging/singa/trunk/content/docs/code-structure.html
    websites/staging/singa/trunk/content/docs/communication.html

Propchange: websites/staging/singa/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Wed Sep  2 08:15:15 2015
@@ -1 +1 @@
-1696297
+1700726

Modified: websites/staging/singa/trunk/content/community.html
==============================================================================
--- websites/staging/singa/trunk/content/community.html (original)
+++ websites/staging/singa/trunk/content/community.html Wed Sep  2 08:15:15 2015
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2015-08-17 
+ | Generated by Apache Maven Doxia at 2015-09-02 
  | Rendered using Apache Maven Fluido Skin 1.4
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20150817" />
+    <meta name="Date-Revision-yyyymmdd" content="20150902" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Apache SINGA &#x2013; Community</title>
     <link rel="stylesheet" href="./css/apache-maven-fluido-1.4.min.css" />

Modified: websites/staging/singa/trunk/content/community/issue-tracking.html
==============================================================================
--- websites/staging/singa/trunk/content/community/issue-tracking.html (original)
+++ websites/staging/singa/trunk/content/community/issue-tracking.html Wed Sep  2 08:15:15 2015
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2015-08-17 
+ | Generated by Apache Maven Doxia at 2015-09-02 
  | Rendered using Apache Maven Fluido Skin 1.4
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20150817" />
+    <meta name="Date-Revision-yyyymmdd" content="20150902" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Apache SINGA &#x2013; Issue Tracking</title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" />

Modified: websites/staging/singa/trunk/content/community/mail-lists.html
==============================================================================
--- websites/staging/singa/trunk/content/community/mail-lists.html (original)
+++ websites/staging/singa/trunk/content/community/mail-lists.html Wed Sep  2 08:15:15 2015
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2015-08-17 
+ | Generated by Apache Maven Doxia at 2015-09-02 
  | Rendered using Apache Maven Fluido Skin 1.4
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20150817" />
+    <meta name="Date-Revision-yyyymmdd" content="20150902" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Apache SINGA &#x2013; Project Mailing Lists</title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" />

Modified: websites/staging/singa/trunk/content/community/source-repository.html
==============================================================================
--- websites/staging/singa/trunk/content/community/source-repository.html (original)
+++ websites/staging/singa/trunk/content/community/source-repository.html Wed Sep  2 08:15:15 2015
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2015-08-17 
+ | Generated by Apache Maven Doxia at 2015-09-02 
  | Rendered using Apache Maven Fluido Skin 1.4
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20150817" />
+    <meta name="Date-Revision-yyyymmdd" content="20150902" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Apache SINGA &#x2013; Source Repository</title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" />

Modified: websites/staging/singa/trunk/content/community/team-list.html
==============================================================================
--- websites/staging/singa/trunk/content/community/team-list.html (original)
+++ websites/staging/singa/trunk/content/community/team-list.html Wed Sep  2 08:15:15 2015
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2015-08-17 
+ | Generated by Apache Maven Doxia at 2015-09-02 
  | Rendered using Apache Maven Fluido Skin 1.4
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20150817" />
+    <meta name="Date-Revision-yyyymmdd" content="20150902" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Apache SINGA &#x2013; The SINGA Team</title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" />

Modified: websites/staging/singa/trunk/content/develop/contribute-code.html
==============================================================================
--- websites/staging/singa/trunk/content/develop/contribute-code.html (original)
+++ websites/staging/singa/trunk/content/develop/contribute-code.html Wed Sep  2 08:15:15 2015
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2015-08-17 
+ | Generated by Apache Maven Doxia at 2015-09-02 
  | Rendered using Apache Maven Fluido Skin 1.4
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20150817" />
+    <meta name="Date-Revision-yyyymmdd" content="20150902" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Apache SINGA &#x2013; How to Contribute Code</title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" />

Modified: websites/staging/singa/trunk/content/develop/contribute-docs.html
==============================================================================
--- websites/staging/singa/trunk/content/develop/contribute-docs.html (original)
+++ websites/staging/singa/trunk/content/develop/contribute-docs.html Wed Sep  2 08:15:15 2015
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2015-08-17 
+ | Generated by Apache Maven Doxia at 2015-09-02 
  | Rendered using Apache Maven Fluido Skin 1.4
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20150817" />
+    <meta name="Date-Revision-yyyymmdd" content="20150902" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Apache SINGA &#x2013; How to Contribute Documentation</title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" />

Modified: websites/staging/singa/trunk/content/develop/how-contribute.html
==============================================================================
--- websites/staging/singa/trunk/content/develop/how-contribute.html (original)
+++ websites/staging/singa/trunk/content/develop/how-contribute.html Wed Sep  2 08:15:15 2015
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2015-08-17 
+ | Generated by Apache Maven Doxia at 2015-09-02 
  | Rendered using Apache Maven Fluido Skin 1.4
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20150817" />
+    <meta name="Date-Revision-yyyymmdd" content="20150902" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Apache SINGA &#x2013; How to Contribute to SINGA</title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" />

Modified: websites/staging/singa/trunk/content/develop/schedule.html
==============================================================================
--- websites/staging/singa/trunk/content/develop/schedule.html (original)
+++ websites/staging/singa/trunk/content/develop/schedule.html Wed Sep  2 08:15:15 2015
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2015-08-17 
+ | Generated by Apache Maven Doxia at 2015-09-02 
  | Rendered using Apache Maven Fluido Skin 1.4
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20150817" />
+    <meta name="Date-Revision-yyyymmdd" content="20150902" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Apache SINGA &#x2013; Development Schedule</title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" />

Modified: websites/staging/singa/trunk/content/docs.html
==============================================================================
--- websites/staging/singa/trunk/content/docs.html (original)
+++ websites/staging/singa/trunk/content/docs.html Wed Sep  2 08:15:15 2015
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2015-08-17 
+ | Generated by Apache Maven Doxia at 2015-09-02 
  | Rendered using Apache Maven Fluido Skin 1.4
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20150817" />
+    <meta name="Date-Revision-yyyymmdd" content="20150902" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Apache SINGA &#x2013; Documentation</title>
     <link rel="stylesheet" href="./css/apache-maven-fluido-1.4.min.css" />

Modified: websites/staging/singa/trunk/content/docs/architecture.html
==============================================================================
--- websites/staging/singa/trunk/content/docs/architecture.html (original)
+++ websites/staging/singa/trunk/content/docs/architecture.html Wed Sep  2 08:15:15 2015
@@ -1,15 +1,15 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2015-08-17 
+ | Generated by Apache Maven Doxia at 2015-09-02 
  | Rendered using Apache Maven Fluido Skin 1.4
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20150817" />
+    <meta name="Date-Revision-yyyymmdd" content="20150902" />
     <meta http-equiv="Content-Language" content="en" />
-    <title>Apache SINGA &#x2013; System Architecture</title>
+    <title>Apache SINGA &#x2013; </title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" />
     <link rel="stylesheet" href="../css/site.css" />
     <link rel="stylesheet" href="../css/print.css" media="print" />
@@ -189,7 +189,7 @@
         Apache SINGA</a>
                     <span class="divider">/</span>
       </li>
-        <li class="active ">System Architecture</li>
+        <li class="active "></li>
         
                 
                     
@@ -423,14 +423,15 @@
                         
         <div id="bodyColumn"  class="span10" >
                                   
-            <div class="section">
-<h2><a name="System_Architecture"></a>System Architecture</h2>
-<hr />
+            <p>&#x2014; layout: post title: Architecture category : docs</p>
+<div class="section">
+<h2><a name="tags_:_architecture"></a>tags : [architecture]</h2>
+<p>{% include JB/setup %}</p>
 <div class="section">
 <h3><a name="Logical_Architecture"></a>Logical Architecture</h3>
-<p><img src="../images/distributed/logical.png" style="width: 550px" alt="" /> 
+<p><img src="http://singa.incubator.apache.org/assets/image/logical.png" style="width: 550px" alt="" /> 
 <p><b> Fig.1 - Logical system architecture</b></p>
-<p>SINGA has flexible architecture to support different distributed <a href="frameworks.html">training frameworks</a> (both synchronous and asynchronous). The logical system architecture is shown in Fig.1. The architecture consists of multiple server groups and worker groups:</p>
+<p>SINGA has flexible architecture to support different distributed <a class="externalLink" href="http://singa.incubator.apache.org/docs/frameworks.html">training frameworks</a> (both synchronous and asynchronous). The logical system architecture is shown in Fig.1. The architecture consists of multiple server groups and worker groups:</p>
 
 <ul>
   
@@ -438,7 +439,7 @@
   
 <li><b>Worker group</b>  Each worker group communicates with only one server group.  A worker group trains a complete model replica  against a partition of the training dataset,  and is responsible for computing parameter gradients.  All worker groups run and communicate with the corresponding  server groups asynchronously.  However, inside each worker group,  the workers synchronously compute parameter updates for the model replica.</li>
 </ul>
-<p>There are different strategies to distribute the training workload among workers within a group: </p>
+<p>There are different strategies to distribute the training workload among workers within a group:</p>
 
 <ul>
   
@@ -450,7 +451,7 @@
 </ul></div>
 <div class="section">
 <h3><a name="Implementation"></a>Implementation</h3>
-<p>In SINGA, servers and workers are execution units running in separate threads. They communicate through <a href="communication.html">messages</a>. Every process runs the main thread as a stub that aggregates local messages and forwards them to corresponding (remote) receivers.</p>
+<p>In SINGA, servers and workers are execution units running in separate threads. They communicate through <a class="externalLink" href="http://singa.incubator.apache.org/docs/communication.html">messages</a>. Every process runs the main thread as a stub that aggregates local messages and forwards them to corresponding (remote) receivers.</p>
 <p>Each server group and worker group have a <i>ParamShard</i> object representing a complete model replica. If workers and servers resident in the same process, their <i>ParamShard</i> (partitions) can be configured to share the same memory space. In this case, the messages transferred between different execution units just contain pointers to the data, which reduces the communication cost. Unlike in inter-process cases, the messages have to include the parameter values.</p></div></div>
                   </div>
             </div>

Modified: websites/staging/singa/trunk/content/docs/checkpoint.html
==============================================================================
--- websites/staging/singa/trunk/content/docs/checkpoint.html (original)
+++ websites/staging/singa/trunk/content/docs/checkpoint.html Wed Sep  2 08:15:15 2015
@@ -1,15 +1,15 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2015-08-17 
+ | Generated by Apache Maven Doxia at 2015-09-02 
  | Rendered using Apache Maven Fluido Skin 1.4
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20150817" />
+    <meta name="Date-Revision-yyyymmdd" content="20150902" />
     <meta http-equiv="Content-Language" content="en" />
-    <title>Apache SINGA &#x2013; Checkpoint and Resume</title>
+    <title>Apache SINGA &#x2013; </title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" />
     <link rel="stylesheet" href="../css/site.css" />
     <link rel="stylesheet" href="../css/print.css" media="print" />
@@ -189,7 +189,7 @@
         Apache SINGA</a>
                     <span class="divider">/</span>
       </li>
-        <li class="active ">Checkpoint and Resume</li>
+        <li class="active "></li>
         
                 
                     
@@ -423,80 +423,84 @@
                         
         <div id="bodyColumn"  class="span10" >
                                   
-            <div class="section">
-<h2><a name="Checkpoint_and_Resume"></a>Checkpoint and Resume</h2>
-<hr />
+            <p>&#x2014; layout: post title: Checkpoint and Resume category : docs</p>
 <div class="section">
-<h3><a name="Applications_of_checkpoint"></a>Applications of checkpoint</h3>
-<p>By taking checkpoints of model parameters, we can</p>
+<h2><a name="tags_:_checkpoint_restore"></a>tags : [checkpoint, restore]</h2>
+<p>{% include JB/setup %}</p>
+<p>SINGA checkpoints model parameters onto disk periodically according to user configured frequency. By checkpointing model parameters, we can</p>
 
 <ol style="list-style-type: decimal">
   
 <li>
-<p>Restore (resume) the training from the last checkpoint. For example, if the program crashes before finishing all training steps.</p></li>
+<p>resume the training from the last checkpointing. For example, if the program crashes before finishing all training steps, we can continue the training using checkpoint files.</p></li>
   
 <li>
-<p>Use them as pre-training results for a similar model. For example, the parameters from training a RBM model can be used to initialize a <a href="auto-encoder.html">deep auto-encoder</a> model.</p></li>
+<p>use them to initialize a similar model. For example, the parameters from training a RBM model can be used to initialize a <a class="externalLink" href="http://singa.incubator.apache.org/docs/rbm">deep auto-encoder</a> model.</p></li>
 </ol></div>
 <div class="section">
-<h3><a name="Instructions_for_checkpoint_and_resume"></a>Instructions for checkpoint and resume</h3>
-<p>Checkpoint is controlled by two model configuration fields: <tt>checkpoint_after</tt> (start checkpoint after this number of training steps) and <tt>checkpoint_frequency</tt>. The checkpoint files are located at <tt>WORKSPACE/checkpoint/stepSTEP-workerWORKERID.bin</tt>.</p>
-<p>The following configuration shows an example,</p>
+<h2><a name="Configuration"></a>Configuration</h2>
+<p>Checkpointing is controlled by two configuration fields:</p>
+
+<ul>
+  
+<li><tt>checkpoint_after</tt>, start checkpointing after this number of training steps,</li>
+  
+<li><tt>checkpoint_freq</tt>, frequency of doing checkpointing.</li>
+</ul>
+<p>For example,</p>
 
 <div class="source">
-<div class="source"><pre class="prettyprint">model {
-  ...
-  checkpoint_after: 100
-  checkpoint_frequency: 300
-  ...
-}
+<div class="source"><pre class="prettyprint"># job.conf
+workspace: &quot;WORKSPACE&quot;
+checkpoint_after: 100
+checkpoint_frequency: 300
+...
 </pre></div></div>
-<p>After training for 700 steps, under WORKSPACE/checkpoint folder, there would be two checkpoint files (training on single node):</p>
+<p>Checkpointing files are located at <i>WORKSPACE/checkpoint/stepSTEP-workerWORKERID.bin</i>. For the above configuration, after training for 700 steps, there would be two checkpointing files,</p>
 
 <div class="source">
 <div class="source"><pre class="prettyprint">step400-worker0.bin
 step700-worker0.bin
-</pre></div></div>
+</pre></div></div></div>
 <div class="section">
-<h4><a name="Application_1"></a>Application 1</h4>
-<p>We can resume the training from the last checkpoint (i.e., step 700) by:</p>
+<h2><a name="Application_-_resuming_training"></a>Application - resuming training</h2>
+<p>We can resume the training from the last checkpoint (i.e., step 700) by,</p>
 
 <div class="source">
-<div class="source"><pre class="prettyprint">./bin/singa-run.sh -workspace=WORKSPACE --resume
-</pre></div></div></div>
+<div class="source"><pre class="prettyprint">./bin/singa-run.sh -conf JOB_CONF -resume
+</pre></div></div>
+<p>There is no change to the job configuration.</p></div>
 <div class="section">
-<h4><a name="Application_2"></a>Application 2</h4>
-<p>We can also use the checkpoint file from step 400 as the pre-trained model for a new model by configuring the job.conf of the new model as:</p>
+<h2><a name="Application_-_model_initialization"></a>Application - model initialization</h2>
+<p>We can also use the checkpointing file from step 400 to initialize a new model by configuring the new job as,</p>
 
 <div class="source">
-<div class="source"><pre class="prettyprint">model {
-  ...
-  checkpoint : &quot;WORKSPACE/checkpoint/step400-worker0.bin&quot;
-  ...
-}
+<div class="source"><pre class="prettyprint"># job.conf
+checkpoint : &quot;WORKSPACE/checkpoint/step400-worker0.bin&quot;
+...
 </pre></div></div>
-<p>If there are multiple checkpoint files for the same snapshot due to model partitioning, all the checkpoint files should be added:</p>
+<p>If there are multiple checkpointing files for the same snapshot due to model partitioning, all the checkpointing files should be added,</p>
 
 <div class="source">
-<div class="source"><pre class="prettyprint">model {
-  ...
-  checkpoint : &quot;WORKSPACE/checkpoint/step400-worker0.bin&quot;
-  checkpoint : &quot;WORKSPACE/checkpoint/step400-worker1.bin&quot;
-  ...
-}
+<div class="source"><pre class="prettyprint"># job.conf
+checkpoint : &quot;WORKSPACE/checkpoint/step400-worker0.bin&quot;
+checkpoint : &quot;WORKSPACE/checkpoint/step400-worker1.bin&quot;
+...
 </pre></div></div>
-<p>The launching command is the same as starting a new job</p>
+<p>The training command is the same as starting a new job,</p>
 
 <div class="source">
-<div class="source"><pre class="prettyprint">./bin/singa-run.sh -workspace=WORKSPACE
-</pre></div></div></div></div>
+<div class="source"><pre class="prettyprint">./bin/singa-run.sh -conf JOB_CONF
+</pre></div></div>
+<p>{% comment %}</p></div>
 <div class="section">
-<h3><a name="Implementation_details"></a>Implementation details</h3>
-<p>The checkpoint is done in the Worker class and controlled by two model configuration fields: <tt>checkpoint_after</tt> and <tt>checkpoint_frequency</tt>. Only Params owning the param values from the first group are dumped onto into checkpoint files. For one Param object, its name, version and values are saved. It is possible that the snapshot is separated into multiple files because the neural net is partitioned into multiple workers.</p>
-<p>The Worker&#x2019;s InitLocalParam will initialize Params from checkpoint files if the <tt>checkpoint</tt> field is set. Otherwise it randomly initialize them using user configured initialization method. The Param objects are matched based on name. If the Param is not configured with a name, NeuralNet class will automatically create one for it based on the name of the layer to which the Param object belongs. The <tt>checkpoint</tt> can be set by users (Application 1) or by the Resume function (Application 2) of the Trainer class, which finds the files for the latest snapshot and add them to the <tt>checkpoint</tt> filed. It also sets the <tt>step</tt> field of model configuration to the checkpoint step (extracted from file name).</p></div>
+<h2><a name="Advanced_user_guide"></a>Advanced user guide</h2>
+<p>Checkpointing is done in the <a class="externalLink" href="http://singa.incubator.apache.org/api/classsinga_1_1Worker.html">Worker class</a>. Only <tt>Param</tt>s from the first group are dumped into checkpointing files. For a <tt>Param</tt> object, its name, version and values are saved. It is possible that the snapshot is separated into multiple files because the neural net is partitioned into multiple workers.</p>
+<p>The Worker&#x2019;s <tt>InitLocalParam</tt> function will initialize parameters from checkpointing files if the <tt>checkpoint</tt> field is set. Otherwise it randomly initialize them using user configured initialization method. The <tt>Param</tt> objects are matched based on name. If a <tt>Param</tt> object is not configured with a name, <tt>NeuralNet</tt> class will automatically create one for it based on the name of the layer. The <tt>checkpoint</tt> can be set by users (Application 1) or by the <tt>Resume</tt> function (Application 2) of the Trainer class, which finds the files for the latest snapshot and add them to the <tt>checkpoint</tt> filed. It also sets the <tt>step</tt> field of model configuration to the checkpoint step (extracted from file name).</p>
 <div class="section">
 <h3><a name="Caution"></a>Caution</h3>
-<p>Both two applications must be taken carefully when Param objects are partitioned due to model partitioning. Because if the training is done using 2 workers, while the new model (or continue training) is trained with 3 workers, then the same original Param object is partitioned in different ways and hence cannot be matched.</p></div></div>
+<p>Both two applications must be taken carefully when <tt>Param</tt> objects are partitioned due to model partitioning. Because if the training is done using 2 workers, while the new model (or continue training) is trained with 3 workers, then the same original <tt>Param</tt> object is partitioned in different ways and hence cannot be matched.</p>
+<p>{% endcomment %}</p></div></div>
                   </div>
             </div>
           </div>

Modified: websites/staging/singa/trunk/content/docs/cnn.html
==============================================================================
--- websites/staging/singa/trunk/content/docs/cnn.html (original)
+++ websites/staging/singa/trunk/content/docs/cnn.html Wed Sep  2 08:15:15 2015
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2015-08-17 
+ | Generated by Apache Maven Doxia at 2015-09-02 
  | Rendered using Apache Maven Fluido Skin 1.4
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20150817" />
+    <meta name="Date-Revision-yyyymmdd" content="20150902" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Apache SINGA &#x2013; </title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" />
@@ -21,7 +21,7 @@
     <script type="text/javascript" src="../js/apache-maven-fluido-1.4.min.js"></script>
 
     
-    <meta name="Notice" content="Licensed to the Apache Software Foundation (ASF) under one            or more contributor license agreements.  See the NOTICE file            distributed with this work for additional information            regarding copyright ownership.  The ASF licenses this file            to you under the Apache License, Version 2.0 (the            &quot;License&quot;); you may not use this file except in compliance            with the License.  You may obtain a copy of the License at            .              http://www.apache.org/licenses/LICENSE-2.0            .            Unless required by applicable law or agreed to in writing,            software distributed under the License is distributed on an            &quot;AS IS&quot; BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY            KIND, either express or implied.  See the License for the            specific language governing permissions and limitations            under the License." />              </hea
 d>
+                  </head>
         <body class="topBarEnabled">
           
     
@@ -425,62 +425,258 @@
                         
         <div id="bodyColumn"  class="span10" >
                                   
-            
-<p>This example will show you how to use SINGA to train a CNN model using cifar10 dataset.</p>
+            <p>&#x2014; layout: post title: Example &#x2014; Convolution Neural Network category : docs</p>
 <div class="section">
+<h2><a name="tags_:_cnn_example"></a>tags : [cnn, example]</h2>
+<p>{% include JB/setup %}</p>
+<p>Convolutional neural network (CNN) is a type of feed-forward artificial neural network widely used for image and video classification. In this example, we will use a deep CNN model to do image classification for the <a class="externalLink" href="http://www.cs.toronto.edu/~kriz/cifar.html">CIFAR10 dataset</a>.</p></div>
 <div class="section">
-<h3><a name="Prepare_for_the_data"></a>Prepare for the data</h3>
+<h2><a name="Running_instructions"></a>Running instructions</h2>
+<p>Please refer to the <a class="externalLink" href="http://singa.incubator.apache.org/docs/installation">installation</a> page for instructions on building SINGA, and the <a class="externalLink" href="http://singa.incubator.apache.org/docs/quick-start">quick start</a> for instructions on starting zookeeper.</p>
+<p>We have provided scripts for preparing the training and test dataset in <i>examples/cifar10/</i>.</p>
 
-<ul>
-  
-<li>First go to the <tt>example/cifar10/</tt> folder for preparing the dataset. There should be a makefile example called Makefile.example in the folder. Run the command <tt>cp Makefile.example Makefile</tt> to generate the makefile. Then run the command <tt>make download</tt> and <tt>make create</tt> in the current folder to download cifar10 dataset and prepare for the training and testing datashard.</li>
-</ul></div>
+<div class="source">
+<div class="source"><pre class="prettyprint"># in examples/cifar10
+$ cp Makefile.example Makefile
+$ make download
+$ make create
+</pre></div></div>
+<p>After the datasets are prepared, we start the training by</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">./bin/singa-run.sh -conf examples/cifar10/job.conf
+</pre></div></div>
+<p>After it is started, you should see output like</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">Record job information to /tmp/singa-log/job-info/job-2-20150817-055601
+Executing : ./singa -conf /xxx/incubator-singa/examples/cifar10/job.conf -singa_conf /xxx/incubator-singa/conf/singa.conf -singa_job 2
+E0817 06:56:18.868259 33849 cluster.cc:51] proc #0 -&gt; 192.168.5.128:49152 (pid = 33849)
+E0817 06:56:18.928452 33871 server.cc:36] Server (group = 0, id = 0) start
+E0817 06:56:18.928469 33872 worker.cc:134] Worker (group = 0, id = 0) start
+E0817 06:57:13.657302 33849 trainer.cc:373] Test step-0, loss : 2.302588, accuracy : 0.077900
+E0817 06:57:17.626708 33849 trainer.cc:373] Train step-0, loss : 2.302578, accuracy : 0.062500
+E0817 06:57:24.142645 33849 trainer.cc:373] Train step-30, loss : 2.302404, accuracy : 0.131250
+E0817 06:57:30.813354 33849 trainer.cc:373] Train step-60, loss : 2.302248, accuracy : 0.156250
+E0817 06:57:37.556655 33849 trainer.cc:373] Train step-90, loss : 2.301849, accuracy : 0.175000
+E0817 06:57:44.971276 33849 trainer.cc:373] Train step-120, loss : 2.301077, accuracy : 0.137500
+E0817 06:57:51.801949 33849 trainer.cc:373] Train step-150, loss : 2.300410, accuracy : 0.135417
+E0817 06:57:58.682281 33849 trainer.cc:373] Train step-180, loss : 2.300067, accuracy : 0.127083
+E0817 06:58:05.578366 33849 trainer.cc:373] Train step-210, loss : 2.300143, accuracy : 0.154167
+E0817 06:58:12.518497 33849 trainer.cc:373] Train step-240, loss : 2.295912, accuracy : 0.185417
+</pre></div></div>
+<p>After the training of some steps (depends on the setting) or the job is finished, SINGA will <a class="externalLink" href="http://singa.incubator.apache.org/docs/checkpoint">checkpoint</a> the model parameters.</p></div>
+<div class="section">
+<h2><a name="Details"></a>Details</h2>
+<p>To train a model in SINGA, you need to prepare the datasets, and a job configuration which specifies the neural net structure, training algorithm (BP or CD), SGD update algorithm (e.g. Adagrad), number of training/test steps, etc.</p>
+<div class="section">
+<h3><a name="Data_preparation"></a>Data preparation</h3>
+<p>Before using SINGA, you need to write a program to pre-process the dataset you use to a format that SINGA can read. Please refer to the <a class="externalLink" href="http://singa.incubator.apache.org/docs/data#example---cifar-dataset">Data Preparation</a> to get details about preparing this CIFAR10 dataset.</p></div>
 <div class="section">
-<h3><a name="Set_job_configuration."></a>Set job configuration.</h3>
+<h3><a name="Neural_net"></a>Neural net</h3>
+<p>Figure 1 shows the net structure of the CNN model we used in this example, which is set following <a class="externalLink" href="https://code.google.com/p/cuda-convnet/source/browse/trunk/example-layers/layers-18pct.cfg.">this page</a> The dashed circle represents one feature transformation stage, which generally has four layers as shown in the figure. Sometimes the rectifier layer and normalization layer is omitted or swapped in one stage. For this example, there are 3 such stages.</p>
+<p>Next we follow the guide in <a class="externalLink" href="http://singa.incubator.apache.org/docs/neural-net">neural net page</a> and <a class="externalLink" href="http://singa.incubator.apache.org/docs/layer">layer page</a> to write the neural net configuration.</p>
+
+<div style="text-align: center">
+<img src="http://singa.incubator.apache.org/assets/image/cnn-example.png" style="width: 200px" alt="" /> <br />
+<b>Figure 1 - Net structure of the CNN example.</b></img>
+</div>
 
 <ul>
   
-<li>If you just want to use the training model provided in this example, you can just use job.conf file in current directory. Fig. 1 gives an example of CNN struture. In this example, we define a CNN model that contains 3 convolution+relu+maxpooling+normalization layers. If you want to learn more about how it is configured, you can go to <a class="externalLink" href="http://singa.incubator.apache.org/docs/model-config.html">Model Configuration</a> to get details.</li>
+<li>
+<p>We configure a <a class="externalLink" href="http://singa.incubator.apache.org/docs/layer#data-layers">data layer</a> to read the training/testing <tt>Records</tt> from <tt>DataShard</tt>.</p>
+  
+<div class="source">
+<div class="source"><pre class="prettyprint">layer{
+    name: &quot;data&quot;
+    type: kShardData
+    sharddata_conf {
+      path: &quot;examples/cifar10/cifar10_train_shard&quot;
+      batchsize: 16
+      random_skip: 5000
+    }
+    exclude: kTest  # exclude this layer for the testing net
+  }
+layer{
+    name: &quot;data&quot;
+    type: kShardData
+    sharddata_conf {
+      path: &quot;examples/cifar10/cifar10_test_shard&quot;
+      batchsize: 100
+    }
+    exclude: kTrain # exclude this layer for the training net
+  }
+</pre></div></div></li>
+  
+<li>
+<p>We configure two <a class="externalLink" href="http://singa.incubator.apache.org/docs/layer#parser-layers">parser layers</a> to extract the image feature and label from <tt>Records</tt>s loaded by the <i>data</i> layer.</p>
+  
+<div class="source">
+<div class="source"><pre class="prettyprint">layer{
+    name:&quot;rgb&quot;
+    type: kRGBImage
+    srclayers: &quot;data&quot;
+    rgbimage_conf {
+      meanfile: &quot;examples/cifar10/image_mean.bin&quot; # normalize image feature
+    }
+  }
+layer{
+    name: &quot;label&quot;
+    type: kLabel
+    srclayers: &quot;data&quot;
+  }
+</pre></div></div></li>
 </ul>
 
-<div style="text-align: center">
-<img src="../images/dcnn-cifar10.png" style="width: 280px" alt="" /> <br />Fig. 1: CNN example </img>
-</div></div>
-<div class="section">
-<h3><a name="Run_SINGA"></a>Run SINGA</h3>
-
 <ul>
   
-<li>All script of SINGA should be run in the root folder of SINGA. First you need to start the zookeeper service if zookeeper is not started. The command is <tt>./bin/zk-service start</tt>. Then you can run the command <tt>./bin/singa-run.sh -conf examples/cifar10/job.conf</tt> to start a SINGA job using examples/cifar10/job.conf as the job configuration. After it is started, you should get a screenshots like the following:</li>
+<li>
+<p>We configure layers for the feature transformation as follows (all layers are built-in layers in SINGA; hyper-parameters of these layers are set according to <a class="externalLink" href="https://code.google.com/p/cuda-convnet/source/browse/trunk/example-layers/layers-18pct.cfg">Alex&#x2019;s setting</a>).</p>
+  
+<div class="source">
+<div class="source"><pre class="prettyprint">layer {
+    name: &quot;conv1&quot;
+    type: kConvolution
+    srclayers: &quot;rgb&quot;
+    convolution_conf {
+      num_filters: 32
+      kernel: 5
+      stride: 1
+      pad:2
+    }
+    param {
+      name: &quot;w1&quot;
+      init {
+        type:kGaussian
+        std:0.0001
+      }
+    }
+    param {
+      name: &quot;b1&quot;
+      lr_scale:2.0
+      init {
+        type: kConstant
+        value:0
+      }
+    }
+  }
+
+  layer {
+    name: &quot;pool1&quot;
+    type: kPooling
+    srclayers: &quot;conv1&quot;
+    pooling_conf {
+      pool: MAX
+      kernel: 3
+      stride: 2
+    }
+  }
+  layer {
+    name: &quot;relu1&quot;
+    type: kReLU
+    srclayers:&quot;pool1&quot;
+  }
+  layer {
+    name: &quot;norm1&quot;
+    type: kLRN
+    lrn_conf {
+      local_size: 3
+      alpha: 5e-05
+      beta: 0.75
+    }
+    srclayers:&quot;relu1&quot;
+  }
+</pre></div></div></li>
 </ul>
+<p>The configurations for another 2 stages are omitted here.</p>
 
+<ul>
+  
+<li>
+<p>There is a <a class="externalLink" href="http://singa.incubator.apache.org/docs/layer#innerproductlayer">inner product layer</a> after the 3 transformation stages, which is configured with 10 output units, i.e., the number of total labels. The weight matrix param is configured with a large weight decay scale to reduce the over-fitting.</p>
+  
 <div class="source">
-<div class="source"><pre class="prettyprint">    xxx@yyy:zzz/incubator-singa$ ./bin/singa-run.sh -conf examples/cifar10/job.conf
-    Unique JOB_ID is 2
-    Record job information to /tmp/singa-log/job-info/job-2-20150817-055601
-    Executing : ./singa -conf /xxx/incubator-singa/examples/cifar10/job.conf -singa_conf /xxx/incubator-singa/conf/singa.conf -singa_job 2
-    E0817 06:56:18.868259 33849 cluster.cc:51] proc #0 -&gt; 192.168.5.128:49152 (pid = 33849)
-    E0817 06:56:18.928452 33871 server.cc:36] Server (group = 0, id = 0) start
-    E0817 06:56:18.928469 33872 worker.cc:134] Worker (group = 0, id = 0) start
-    E0817 06:57:13.657302 33849 trainer.cc:373] Test step-0, loss : 2.302588, accuracy : 0.077900
-    E0817 06:57:17.626708 33849 trainer.cc:373] Train step-0, loss : 2.302578, accuracy : 0.062500
-    E0817 06:57:24.142645 33849 trainer.cc:373] Train step-30, loss : 2.302404, accuracy : 0.131250
-    E0817 06:57:30.813354 33849 trainer.cc:373] Train step-60, loss : 2.302248, accuracy : 0.156250
-    E0817 06:57:37.556655 33849 trainer.cc:373] Train step-90, loss : 2.301849, accuracy : 0.175000
-    E0817 06:57:44.971276 33849 trainer.cc:373] Train step-120, loss : 2.301077, accuracy : 0.137500
-    E0817 06:57:51.801949 33849 trainer.cc:373] Train step-150, loss : 2.300410, accuracy : 0.135417
-    E0817 06:57:58.682281 33849 trainer.cc:373] Train step-180, loss : 2.300067, accuracy : 0.127083
-    E0817 06:58:05.578366 33849 trainer.cc:373] Train step-210, loss : 2.300143, accuracy : 0.154167
-    E0817 06:58:12.518497 33849 trainer.cc:373] Train step-240, loss : 2.295912, accuracy : 0.185417
-</pre></div></div>
-<p>After the training of some steps (depends on the setting) or the job is finished, SINGA will checkpoint the current parameter. In the next time, you can train (or use for your application) by loading the checkpoint. Please refer to <a class="externalLink" href="http://singa.incubator.apache.org/docs/checkpoint.html">Checkpoint</a> for the use of checkpoint.</p></div>
+<div class="source"><pre class="prettyprint">layer {
+    name: &quot;ip1&quot;
+    type: kInnerProduct
+    srclayers:&quot;pool3&quot;
+    innerproduct_conf {
+      num_output: 10
+    }
+    param {
+      name: &quot;w4&quot;
+      wd_scale:250
+      init {
+        type:kGaussian
+        std:0.01
+      }
+    }
+    param {
+      name: &quot;b4&quot;
+      lr_scale:2.0
+      wd_scale:0
+      init {
+        type: kConstant
+        value:0
+      }
+    }
+  }
+</pre></div></div></li>
+  
+<li>
+<p>The last layer is a <a class="externalLink" href="http://singa.incubator.apache.org/docs/layer#softmaxloss">Softmax loss layer</a></p>
+  
+<div class="source">
+<div class="source"><pre class="prettyprint">  layer{
+    name: &quot;loss&quot;
+    type: kSoftmaxLoss
+    softmaxloss_conf{
+      topk:1
+    }
+    srclayers:&quot;ip1&quot;
+    srclayers: &quot;label&quot;
+  }
+</pre></div></div></li>
+</ul></div>
 <div class="section">
-<h3><a name="Build_your_own_model"></a>Build your own model</h3>
+<h3><a name="Updater"></a>Updater</h3>
+<p>The <a class="externalLink" href="http://singa.incubator.apache.org/docs/updater#updater">normal SGD updater</a> is selected. The learning rate is changed like stairs, and is configured using the <a class="externalLink" href="http://singa.incubator.apache.org/docs/updater#kfixedstep">kFixedStep</a> type.</p>
 
-<ul>
-  
-<li>If you want to specify you own model, then you need to decribe it in the job.conf file. It should contain the neurualnet structure, training algorithm(backforward or contrastive divergence etc.), SGD update algorithm(e.g. Adagrad), number of training/test steps and training/test frequency, and display features and etc. SINGA will read job.conf as a Google protobuf class <a href="../src/proto/job.proto">JobProto</a>. You can also refer to the <a class="externalLink" href="http://singa.incubator.apache.org/docs/programmer-guide.html">Programmer Guide</a> to get details.</li>
-</ul></div></div>
+<div class="source">
+<div class="source"><pre class="prettyprint">updater{
+  type: kSGD
+  weight_decay:0.004
+  learning_rate {
+    type: kFixedStep
+    fixedstep_conf:{
+      step:0             # lr for step 0-60000 is 0.001
+      step:60000         # lr for step 60000-65000 is 0.0001
+      step:65000         # lr for step 650000- is 0.00001
+      step_lr:0.001
+      step_lr:0.0001
+      step_lr:0.00001
+    }
+  }
+}
+</pre></div></div></div>
+<div class="section">
+<h3><a name="TrainOneBatch_algorithm"></a>TrainOneBatch algorithm</h3>
+<p>The CNN model is a feed forward model, thus should be configured to use the [Back-propagation algorithm]({{ BASE_PATH}}/docs/train-one-batch#back-propagation).</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">alg: kBP
+</pre></div></div></div>
+<div class="section">
+<h3><a name="Cluster_setting"></a>Cluster setting</h3>
+<p>The following configuration set a single worker and server for training. <a class="externalLink" href="http://singa.incubator.apache.org/docs/frameworks">Training frameworks</a> page introduces configurations of a couple of distributed training frameworks.</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">cluster {
+  nworker_groups: 1
+  nserver_groups: 1
+}
+</pre></div></div></div></div>
                   </div>
             </div>
           </div>

Modified: websites/staging/singa/trunk/content/docs/code-structure.html
==============================================================================
--- websites/staging/singa/trunk/content/docs/code-structure.html (original)
+++ websites/staging/singa/trunk/content/docs/code-structure.html Wed Sep  2 08:15:15 2015
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2015-08-17 
+ | Generated by Apache Maven Doxia at 2015-09-02 
  | Rendered using Apache Maven Fluido Skin 1.4
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20150817" />
+    <meta name="Date-Revision-yyyymmdd" content="20150902" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Apache SINGA &#x2013; Code Structure</title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" />

Modified: websites/staging/singa/trunk/content/docs/communication.html
==============================================================================
--- websites/staging/singa/trunk/content/docs/communication.html (original)
+++ websites/staging/singa/trunk/content/docs/communication.html Wed Sep  2 08:15:15 2015
@@ -1,15 +1,15 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2015-08-17 
+ | Generated by Apache Maven Doxia at 2015-09-02 
  | Rendered using Apache Maven Fluido Skin 1.4
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20150817" />
+    <meta name="Date-Revision-yyyymmdd" content="20150902" />
     <meta http-equiv="Content-Language" content="en" />
-    <title>Apache SINGA &#x2013; Communication</title>
+    <title>Apache SINGA &#x2013; </title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" />
     <link rel="stylesheet" href="../css/site.css" />
     <link rel="stylesheet" href="../css/print.css" media="print" />
@@ -189,7 +189,7 @@
         Apache SINGA</a>
                     <span class="divider">/</span>
       </li>
-        <li class="active ">Communication</li>
+        <li class="active "></li>
         
                 
                     
@@ -423,14 +423,15 @@
                         
         <div id="bodyColumn"  class="span10" >
                                   
-            <div class="section">
-<h2><a name="Communication"></a>Communication</h2>
-<hr />
+            <p>&#x2014; layout: post title: Communication category : docs</p>
+<div class="section">
+<h2><a name="tags_:_rnn_example"></a>tags : [rnn, example]</h2>
+<p>{% include JB/setup %}</p>
 <p>Different messaging libraries has different benefits and drawbacks. For instance, MPI provides fast message passing between GPUs (using GPUDirect), but does not support fault-tolerance well. On the contrary, systems using ZeroMQ can be fault-tolerant, but does not support GPUDirect. The AllReduce function of MPI is also missing in ZeroMQ which is efficient for data aggregation for distributed training. In Singa, we provide general messaging APIs for communication between threads within a process and across processes, and let users choose the underlying implementation (MPI or ZeroMQ) that meets their requirements.</p>
 <p>Singa&#x2019;s messaging library consists of two components, namely the message, and the socket to send and receive messages. <b>Socket</b> refers to a Singa defined data structure instead of the Linux Socket. We will introduce the two components in detail with the following figure as an example architecture.</p>
 <p><img src="../images/arch/arch2.png" style="width: 550px" alt="" /> <img src="../images/arch/comm.png" style="width: 550px" alt="" /> 
 <p><b> Fig.1 - Example physical architecture and network connection</b></p>
-<p>Fig.1 shows an example physical architecture and its network connection. <a href="architecture.html}">Section-partition server side ParamShard</a> has a detailed description of the architecture. Each process consists of one main thread running the stub and multiple background threads running the worker and server tasks. The stub of the main thread forwards messages among threads . The worker and server tasks are performed by the background threads.</p>
+<p>Fig.1 shows an example physical architecture and its network connection. <a class="externalLink" href="http://singa.incubator.apache.org/docs/architecture.html}">Section-partition server side ParamShard</a> has a detailed description of the architecture. Each process consists of one main thread running the stub and multiple background threads running the worker and server tasks. The stub of the main thread forwards messages among threads . The worker and server tasks are performed by the background threads.</p>
 <div class="section">
 <h3><a name="Message"></a>Message</h3>
 <p><object type="image/svg+xml" style="width: 100px" data="../images/msg.svg"> Not supported </object> 
@@ -799,79 +800,50 @@ class SafeQueue{
 </pre></div></div>
 <p>For inter-process communication, we serialize the message and call MPI&#x2019;s send/receive functions to transfer them. All inter-process connections are setup by MPI at the beginning. Consequently, the Connect and Bind functions do nothing for both inter-process and intra-process communication.</p>
 <p>MPI&#x2019;s AllReduce function is efficient for data aggregation in distributed training. For example, <a class="externalLink" href="http://arxiv.org/abs/1501.02876">DeepImage of Baidu</a> uses AllReduce to aggregate the updates of parameter from all workers. It has similar architecture as <a href="architecture.html">Fig.2</a>, where every process has a server group and is connected with all other processes. Hence, we can implement DeepImage in Singa by simply using MPI&#x2019;s AllReduce function for inter-process communication.</p>
-<!-- #### Server socket
+<p>{% comment %}</p></div>
+<div class="section">
+<h4><a name="Server_socket"></a>Server socket</h4>
+<p>Each server has a DEALER socket to communicate with the stub in the main thread via an <i>in-proc</i> socket. It receives requests issued from workers and other servers, and forwarded by the ROUTER of the stub. Since the requests are forwarded by the stub, we can make the location of workers transparent to server threads. The stub records the locations of workers and servers.</p>
+<p>As explained previously in the [APIs](<a class="externalLink" href="http://singa.incubator.apache.org{%">http://singa.incubator.apache.org{%</a> post_url /docs/2015-03-20-parameter-management %}) for parameter management, some requests may not be processed immediately but have to be re-queued. For instance, the Get request cannot be processed if the requested parameter is not available, i.e., the parameter has not been put into the server&#x2019;s ParamShard. The re-queueing operation is implemented sendings the messages to the ROUTER socket of the stub which treats the message as a newly arrived request and queues it for processing.</p></div>
+<div class="section">
+<h4><a name="Worker_socket"></a>Worker socket</h4>
+<p>Each worker thread has a DEALER socket to communicate with the stub in the main thread via an <i>in-proc</i> socket. It sends (Get/Update) requests to the ROUTER in the stub which forwards the request to (local or remote) processes. In case of the partition of ParamShard of worker side, it may also transfer data with other workers via the DEALER socket. Again, the location of the other side (a server or worker) of the communication is transparent to the worker. The stub handles the addressing.</p>
+<p>PMClient executes the training logic, during which it generates GET and UPDATE requests. A request received at the worker&#x2019;s main thread contains ID of the PMClient instance. The worker determines which server to send the request based on its content, then sends it via the corresponding socket. Response messages received from any of the server socket are forwarded to the in-proc ROUTER socket. Since each response header contains the PMClient ID, it is routed to the correct instance.</p></div>
+<div class="section">
+<h4><a name="Stub_sockets"></a>Stub sockets</h4>
+<div class="section">
+<h5><a name="ROUTER_socket"></a>ROUTER socket</h5>
+<p>The main thread has a ROUTER socket to communicate with background threads.</p>
+<p>It forwards the requests from workers to background servers. There can be multiple servers.If all servers maintain the same (sub) ParamShard, then the request can be forwarded to any of them. Load-balance (like round-robin) can be implemented in the stub to improve the performance. If each server maintains a sub-set of the local ParamShard, then the stub forwards each request to the corresponding server. It also forwards the synchronization requests from remote servers to local servers in the same way.</p>
+<p>In the case of neural network partition (i.e., model partition), neighbor layers would transfer data with each other. Hence, the ROUTER would forwards data transfer requests from one worker to other worker. The stub looks up the location table to decide where to forward each request.</p></div>
+<div class="section">
+<h5><a name="DEALER_sockets"></a>DEALER sockets</h5>
+<p>The main thread has multiple DEALER sockets to communicate with other processes, one socket per process. Two processes are connected if one of the following cases exists:</p>
+
+<ul>
+  
+<li>one worker group spans across the two processes;</li>
+  
+<li>two connected server groups are separated in the two processes;</li>
+  
+<li>workers and the subscribed servers are separated in the two processes.</li>
+</ul>
+<p>All messages in SINGA are of multi-frame ZeroMQ format. The figure above demonstrates different types of messages exchanged in the system.</p>
 
-Each server has a DEALER socket to communicate with the stub in the main
-thread via an _in-proc_ socket. It receives requests issued from workers and
-other servers, and forwarded by the ROUTER of the stub. Since the requests are forwarded by the
-stub, we can make the location of workers transparent to server threads. The
-stub records the locations of workers and servers.
-
-As explained previously in the
-[APIs]({{ BASE_PATH }}{% post_url /docs/2015-03-20-parameter-management %})
-for parameter management, some requests may
-not be processed immediately but have to be re-queued. For instance, the Get
-request cannot be processed if the requested parameter is not available, i.e.,
-the parameter has not been put into the server's ParamShard. The re-queueing
-operation is implemented sendings the messages to the ROUTER
-socket of the stub which treats the message as a newly arrived request
-and queues it for processing.
-
-#### Worker socket
-
-Each worker thread has a DEALER socket to communicate with the stub in the main
-thread via an _in-proc_ socket. It sends (Get/Update) requests to the ROUTER in
-the stub which forwards the request to (local or remote) processes. In case of
-the partition of ParamShard of worker side, it may also transfer data with other
-workers via the DEALER socket. Again, the location of the other side (a server
-or worker) of the communication is transparent to the worker. The stub handles
-the addressing.
-
-PMClient executes the training logic, during which it generates GET and UPDATE
-requests. A request received at the worker's main thread contains ID of the
-PMClient instance. The worker determines which server to send the request based
-on its content, then sends it via the corresponding socket. Response messages
-received from any of the server socket are forwarded to the in-proc ROUTER
-socket. Since each response header contains the PMClient ID, it is routed to
-the correct instance.
-
-#### Stub sockets
-
-##### ROUTER socket
-The main thread has a ROUTER socket to communicate with background threads.
-
-It forwards the requests from workers to background servers. There can be
-multiple servers.If all servers maintain the same (sub) ParamShard, then the
-request can be forwarded to any of them. Load-balance (like round-robin) can be
-implemented in the stub to improve the performance. If each server maintains a
-sub-set of the local ParamShard, then the stub forwards each request to the
-corresponding server.  It also forwards the synchronization requests from
-remote servers to local servers in the same way.
-
-In the case of neural network partition (i.e., model partition), neighbor
-layers would transfer data with each other. Hence, the ROUTER would forwards
-data transfer requests from one worker to other worker. The stub looks up the
-location table to decide where to forward each request.
-
-##### DEALER sockets
-
-The main thread has multiple DEALER sockets to communicate with other
-processes, one socket per process. Two processes are connected if one of the
-following cases exists:
-
-  * one worker group spans across the two processes;
-  * two connected server groups are separated in the two processes;
-  * workers and the subscribed servers are separated in the two processes.
-
-
-All messages in SINGA are of multi-frame ZeroMQ format. The figure above demonstrates different types of messages exchanged in the system.
-
-  1. Requests generated by PMClient consist of the parameter content (which could be empty), followed by the parameter ID (key) and the request type (GET/PUT/REQUEST). Responses received by PMClient are also of this format.
-  2. Messages received by the worker's main thread from PMClient instances contain another frame identifying the PMClient connection (or PMClient ID).
-  3. Requests originating form a worker and arriving at the server contain another frame identifying the worker's connection (or Worker ID).
-  4. Requests originating from another server and arriving at the server have the same format as (3), but the first frame identifies the server connection (or Server ID).
-  5. After a PMServer processes a request, it generates a message with the format similar to (3) but with extra frame indicating if the message is to be routed back to a worker (a response message) or to route to another server (a SYNC request).
-  6. When a request is re-queued, the PMServer generates a message and sends it directly to the server's front-end socket. The re-queued request seen by the server's main thread consists of all the frames in (3), followed by a REQUEUED frame, and finally by another frame generated by the ROUTER socket identifying connection from the PMServer instance. The main thread then strips off these additional two frames before  forwarding it to another PMServer instance like another ordinary request. --></div></div></div>
+<ol style="list-style-type: decimal">
+  
+<li>Requests generated by PMClient consist of the parameter content (which could be empty), followed by the parameter ID (key) and the request type (GET/PUT/REQUEST). Responses received by PMClient are also of this format.</li>
+  
+<li>Messages received by the worker&#x2019;s main thread from PMClient instances contain another frame identifying the PMClient connection (or PMClient ID).</li>
+  
+<li>Requests originating form a worker and arriving at the server contain another frame identifying the worker&#x2019;s connection (or Worker ID).</li>
+  
+<li>Requests originating from another server and arriving at the server have the same format as (3), but the first frame identifies the server connection (or Server ID).</li>
+  
+<li>After a PMServer processes a request, it generates a message with the format similar to (3) but with extra frame indicating if the message is to be routed back to a worker (a response message) or to route to another server (a SYNC request).</li>
+  
+<li>When a request is re-queued, the PMServer generates a message and sends it directly to the server&#x2019;s front-end socket. The re-queued request seen by the server&#x2019;s main thread consists of all the frames in (3), followed by a REQUEUED frame, and finally by another frame generated by the ROUTER socket identifying connection from the PMServer instance. The main thread then strips off these additional two frames before forwarding it to another PMServer instance like another ordinary request. {% endcomment %}</li>
+</ol></div></div></div></div>
                   </div>
             </div>
           </div>

Added: websites/staging/singa/trunk/content/docs/data.html
==============================================================================
    (empty)



Mime
View raw message