singa-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jiny...@apache.org
Subject svn commit: r1700722 [2/3] - /incubator/singa/site/trunk/content/markdown/docs/
Date Wed, 02 Sep 2015 07:59:21 GMT
Modified: incubator/singa/site/trunk/content/markdown/docs/mlp.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/mlp.md?rev=1700722&r1=1700721&r2=1700722&view=diff
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/mlp.md (original)
+++ incubator/singa/site/trunk/content/markdown/docs/mlp.md Wed Sep  2 07:59:20 2015
@@ -1,66 +1,223 @@
-Title:
-Notice:    Licensed to the Apache Software Foundation (ASF) under one
-           or more contributor license agreements.  See the NOTICE file
-           distributed with this work for additional information
-           regarding copyright ownership.  The ASF licenses this file
-           to you under the Apache License, Version 2.0 (the
-           "License"); you may not use this file except in compliance
-           with the License.  You may obtain a copy of the License at
-           .
-             http://www.apache.org/licenses/LICENSE-2.0
-           .
-           Unless required by applicable law or agreed to in writing,
-           software distributed under the License is distributed on an
-           "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-           KIND, either express or implied.  See the License for the
-           specific language governing permissions and limitations
-           under the License.
-
-This example will show you how to use SINGA to train a MLP model using mnist dataset.
-
-### Prepare for the data
-* First go to the `example/mnist/` folder for preparing the dataset. There should be a makefile example called Makefile.example in the folder. Run the command `cp Makefile.example Makefile` to generate the makefile.
-Then run the command `make download` and `make create`  in the current folder to download mnist dataset and prepare for the training and testing datashard. 
+---
+layout: post
+title:  Example --- MultiLayer Perceptron
+category : docs
+tags : [example, mlp]
+---
+{% include JB/setup %}
+
+
+Multilayer perceptron (MLP) is a feed-forward artificial neural network model.
+A MLP typically consists of multiple directly connected layers, with each layer fully
+connected to the next one. In this example, we will use SINGA to train a
+[simple MLP model proposed by Ciresan](http://arxiv.org/abs/1003.0358)
+for classifying handwritten digits from the [MNIST dataset](http://yann.lecun.com/exdb/mnist/).
+
+## Running instructions
+Please refer to the [installation](http://singa.incubator.apache.org/docs/installation) page for
+instructions on building SINGA, and the [quick start](http://singa.incubator.apache.org/docs/quick-start)
+for instructions on starting zookeeper.
+
+We have provided scripts for preparing the training and test dataset in *examples/cifar10/*.
+
+    # in examples/mnist
+    $ cp Makefile.example Makefile
+    $ make download
+    $ make create
+
+After the datasets are prepared, we start the training by
+
+    ./bin/singa-run.sh -conf examples/mnist/job.conf
+
+After it is started, you should see output like
+
+    Record job information to /tmp/singa-log/job-info/job-1-20150817-055231
+    Executing : ./singa -conf /xxx/incubator-singa/examples/mnist/job.conf -singa_conf /xxx/incubator-singa/conf/singa.conf -singa_job 1
+    E0817 07:15:09.211885 34073 cluster.cc:51] proc #0 -> 192.168.5.128:49152 (pid = 34073)
+    E0817 07:15:14.972231 34114 server.cc:36] Server (group = 0, id = 0) start
+    E0817 07:15:14.972520 34115 worker.cc:134] Worker (group = 0, id = 0) start
+    E0817 07:15:24.462602 34073 trainer.cc:373] Test step-0, loss : 2.341021, accuracy : 0.109100
+    E0817 07:15:47.341076 34073 trainer.cc:373] Train step-0, loss : 2.357269, accuracy : 0.099000
+    E0817 07:16:07.173364 34073 trainer.cc:373] Train step-10, loss : 2.222740, accuracy : 0.201800
+    E0817 07:16:26.714855 34073 trainer.cc:373] Train step-20, loss : 2.091030, accuracy : 0.327200
+    E0817 07:16:46.590946 34073 trainer.cc:373] Train step-30, loss : 1.969412, accuracy : 0.442100
+    E0817 07:17:06.207080 34073 trainer.cc:373] Train step-40, loss : 1.865466, accuracy : 0.514800
+    E0817 07:17:25.890033 34073 trainer.cc:373] Train step-50, loss : 1.773849, accuracy : 0.569100
+    E0817 07:17:51.208935 34073 trainer.cc:373] Test step-60, loss : 1.613709, accuracy : 0.662100
+    E0817 07:17:53.176766 34073 trainer.cc:373] Train step-60, loss : 1.659150, accuracy : 0.652600
+    E0817 07:18:12.783370 34073 trainer.cc:373] Train step-70, loss : 1.574024, accuracy : 0.666000
+    E0817 07:18:32.904942 34073 trainer.cc:373] Train step-80, loss : 1.529380, accuracy : 0.670500
+    E0817 07:18:52.608111 34073 trainer.cc:373] Train step-90, loss : 1.443911, accuracy : 0.703500
+    E0817 07:19:12.168465 34073 trainer.cc:373] Train step-100, loss : 1.387759, accuracy : 0.721000
+    E0817 07:19:31.855865 34073 trainer.cc:373] Train step-110, loss : 1.335246, accuracy : 0.736500
+    E0817 07:19:57.327133 34073 trainer.cc:373] Test step-120, loss : 1.216652, accuracy : 0.769900
+
+After the training of some steps (depends on the setting) or the job is
+finished, SINGA will [checkpoint](http://singa.incubator.apache.org/docs/checkpoint) the model parameters.
+
+## Details
+
+
+To train a model in SINGA, you need to prepare the datasets,
+and a job configuration which specifies the neural net structure, training
+algorithm (BP or CD), SGD update algorithm (e.g. Adagrad),
+number of training/test steps, etc.
+
+### Data preparation
+Before using SINGA, you need to write a program to pre-process the dataset you
+use to a format that SINGA can read. Please refer to the
+[Data Preparation](http://singa.incubator.apache.org/docs/data#example---mnist-dataset) to get details about preparing
+this MNIST dataset.
 
-### Set model and cluster configuration.
-* If you just want to use the training model provided in this example, you can just use job.conf file in current directory. Fig. 1 gives an example of MLP struture. In this example, we define a neurualnet that contains 5 hidden layer. fc+tanh is the hidden layer(fc is for the inner product part, and tanh is for the non-linear activation function), and the final softmax layer is represented as fc+loss (inner product and softmax). For each layer, we define its name, input layer(s), basic configurations (e.g. number of nodes, parameter initialization settings). If you want to learn more about how it is configured, you can go to [Model Configuration](http://singa.incubator.apache.org/docs/model-config.html) to get details. 
+
+### Neural net
 
 <div style = "text-align: center">
-<img src = "../images/mlp_example.png" style = "width: 280px"> <br/>Fig. 1: MLP example </img>
+<img src = "http://singa.incubator.apache.org/assets/image/mlp-example.png" style = "width: 230px">
+<br/><strong>Figure 1 - Net structure of the MLP example. </strong></img>
 </div>
 
-### Run SINGA
-
-* All script of SINGA should be run in the root folder of SINGA.
-First you need to start the zookeeper service if zookeeper is not started. The command is `./bin/zk-service start`. 
-Then you can run the command `./bin/singa-run.sh -conf examples/mnist/job.conf` to start a SINGA job using examples/mnist/job.conf as the job configuration.
-After it is started, you should get a screenshots like the following:
-
-        xxx@yyy:zzz/incubator-singa$ ./bin/singa-run.sh -conf examples/mnist/job.conf
-        Unique JOB_ID is 1
-        Record job information to /tmp/singa-log/job-info/job-1-20150817-055231
-        Executing : ./singa -conf /xxx/incubator-singa/examples/mnist/job.conf -singa_conf /xxx/incubator-singa/conf/singa.conf -singa_job 1
-        E0817 07:15:09.211885 34073 cluster.cc:51] proc #0 -> 192.168.5.128:49152 (pid = 34073)
-        E0817 07:15:14.972231 34114 server.cc:36] Server (group = 0, id = 0) start
-        E0817 07:15:14.972520 34115 worker.cc:134] Worker (group = 0, id = 0) start
-        E0817 07:15:24.462602 34073 trainer.cc:373] Test step-0, loss : 2.341021, accuracy : 0.109100
-        E0817 07:15:47.341076 34073 trainer.cc:373] Train step-0, loss : 2.357269, accuracy : 0.099000
-        E0817 07:16:07.173364 34073 trainer.cc:373] Train step-10, loss : 2.222740, accuracy : 0.201800
-        E0817 07:16:26.714855 34073 trainer.cc:373] Train step-20, loss : 2.091030, accuracy : 0.327200
-        E0817 07:16:46.590946 34073 trainer.cc:373] Train step-30, loss : 1.969412, accuracy : 0.442100
-        E0817 07:17:06.207080 34073 trainer.cc:373] Train step-40, loss : 1.865466, accuracy : 0.514800
-        E0817 07:17:25.890033 34073 trainer.cc:373] Train step-50, loss : 1.773849, accuracy : 0.569100
-        E0817 07:17:51.208935 34073 trainer.cc:373] Test step-60, loss : 1.613709, accuracy : 0.662100
-        E0817 07:17:53.176766 34073 trainer.cc:373] Train step-60, loss : 1.659150, accuracy : 0.652600
-        E0817 07:18:12.783370 34073 trainer.cc:373] Train step-70, loss : 1.574024, accuracy : 0.666000
-        E0817 07:18:32.904942 34073 trainer.cc:373] Train step-80, loss : 1.529380, accuracy : 0.670500
-        E0817 07:18:52.608111 34073 trainer.cc:373] Train step-90, loss : 1.443911, accuracy : 0.703500
-        E0817 07:19:12.168465 34073 trainer.cc:373] Train step-100, loss : 1.387759, accuracy : 0.721000
-        E0817 07:19:31.855865 34073 trainer.cc:373] Train step-110, loss : 1.335246, accuracy : 0.736500
-        E0817 07:19:57.327133 34073 trainer.cc:373] Test step-120, loss : 1.216652, accuracy : 0.769900
-
-After the training of some steps (depends on the setting) or the job is finished, SINGA will checkpoint the current parameter. In the next time, you can train (or use for your application) by loading the checkpoint. Please refer to [Checkpoint](http://singa.incubator.apache.org/docs/checkpoint.html) for the use of checkpoint.
-
-### Build your own model
-* If you want to specify you own model, then you need to decribe  it in the job.conf file. It should contain the neurualnet structure, training algorithm(backforward or contrastive divergence etc.), SGD update algorithm(e.g. Adagrad), number of training/test steps and training/test frequency, and display features and etc. SINGA will read job.conf as a Google protobuf class [JobProto](../src/proto/job.proto). You can also refer to the [Programmer Guide](http://singa.incubator.apache.org/docs/programmer-guide.html) to get details. 
 
+Figure 1 shows the structure of the simple MLP model, which is constructed following
+[Ciresan's paper](http://arxiv.org/abs/1003.0358). The dashed circle contains
+two layers which represent one feature transformation stage. There are 6 such
+stages in total. They sizes of the [InnerProductLayer](http://singa.incubator.apache.org/docs/layer#innerproductlayer)s in these circles decrease from
+2500->2000->1500->1000->500->10.
+
+Next we follow the guide in [neural net page](http://singa.incubator.apache.org/docs/neural-net)
+and [layer page](http://singa.incubator.apache.org/docs/layer) to write the neural net configuration.
+
+* We configure a [data layer](http://singa.incubator.apache.org/docs/layer#data-layers) to read
+the training/testing `Records` from `DataShard`.
+
+        layer {
+            name: "data"
+            type: kShardData
+            sharddata_conf {
+              path: "examples/mnist/mnist_train_shard"
+              batchsize: 1000
+            }
+            exclude: kTest
+          }
+
+        layer {
+            name: "data"
+            type: kShardData
+            sharddata_conf {
+              path: "examples/mnist/mnist_test_shard"
+              batchsize: 1000
+            }
+            exclude: kTrain
+          }
+
+* We configure two [parser layers](http://singa.incubator.apache.org/docs/layer#parser-layers)
+to extract the image feature and label from `Records`s loaded by the *data* layer.
+The [MnistLayer](http://singa.incubator.apache.org/docs/layer#mnistlayer) will normalize the pixel
+values into [-1,1].
+
+        layer{
+            name:"mnist"
+            type: kMnist
+            srclayers: "data"
+            mnist_conf {
+              norm_a: 127.5
+              norm_b: 1
+            }
+          }
+
+        layer{
+            name: "label"
+            type: kLabel
+            srclayers: "data"
+          }
+
+* All [InnerProductLayer](http://singa.incubator.apache.org/docs/layer#innerproductlayer)s are configured similarly as,
+
+        layer{
+          name: "fc1"
+          type: kInnerProduct
+          srclayers:"mnist"
+          innerproduct_conf{
+            num_output: 2500
+          }
+          param{
+            name: "w1"
+            init {
+              type: kUniform
+              low:-0.05
+              high:0.05
+            }
+          }
+          param{
+            name: "b1"
+            init {
+              type : kUniform
+              low: -0.05
+              high:0.05
+            }
+          }
+        }
+
+    with the `num_output` decreasing from 2500 to 10.
+
+* All [TanhLayer](http://singa.incubator.apache.org/docs/layer#tanhlayer) are configured similarly as,
+
+        layer{
+          name: "tanh1"
+          type: kTanh
+          tanh_conf {
+            outer_scale: 1.7159047
+            inner_scale: 0.6666667
+          }
+          srclayers:"fc1"
+        }
+
+  every neuron from the source layer is transformed as `outer_scale*tanh(inner_scale* x)`.
+
+* The final [Softmax loss layer](http://singa.incubator.apache.org/docs/layer#softmaxloss) connects
+to LabelLayer and the last TanhLayer.
+
+        layer{
+          name: "loss"
+          type:kSoftmaxLoss
+          softmaxloss_conf{
+            topk:1
+          }
+          srclayers:"fc6"
+          srclayers:"label"
+        }
+
+### Updater
+The [normal SGD updater](http://singa.incubator.apache.org/docs/updater#updater) is selected.
+The learning rate shrinks by 0.997 every 60 steps (i.e., one epoch).
+
+    updater{
+      type: kSGD
+      learning_rate{
+        base_lr: 0.001
+        type : kStep
+        step_conf{
+          change_freq: 60
+          gamma: 0.997
+        }
+      }
+    }
+
+### TrainOneBatch algorithm
+
+The MLP model is a feed-forward model, hence
+[Back-propagation algorithm]({{ BASE_PATH}}/docs/train-one-batch#back-propagation)
+is selected.
+
+		alg: kBP
+
+
+### Cluster setting
+The following configuration set a single worker and server for training.
+[Training frameworks](http://singa.incubator.apache.org/docs/frameworks) page introduces configurations of a couple of distributed
+training frameworks.
+
+    cluster {
+      nworker_groups: 1
+      nserver_groups: 1
+    }

Added: incubator/singa/site/trunk/content/markdown/docs/neural-net.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/neural-net.md?rev=1700722&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/neural-net.md (added)
+++ incubator/singa/site/trunk/content/markdown/docs/neural-net.md Wed Sep  2 07:59:20 2015
@@ -0,0 +1,334 @@
+---
+layout: post
+title: NeuralNet
+category : docs
+tags : [installation, examples]
+---
+{% include JB/setup %}
+
+`NeuralNet` in SINGA represents an instance of user's neural net model. As the
+neural net typically consists of a set of layers, `NeuralNet` comprises
+a set of unidirectionally connected [Layer](http://singa.incubator.apache.org/docs/layer)s.
+This page describes how to convert an user's neural net into
+the configuration of `NeuralNet`.
+
+<img src="http://singa.incubator.apache.org/assets/image/model-category.png" align="center" width="200px"/>
+<span><strong>Figure 1 - Categorization of popular deep learning models.</strong></span>
+
+## Net structure configuration
+
+Users configure the `NeuralNet` by listing all layers of the neural net and
+specifying each layer's source layer names. Popular deep learning models can be
+categorized as Figure 1. The subsequent sections give details for each
+category.
+
+### Feed-forward models
+
+<div align = "left">
+<img src="http://singa.incubator.apache.org/assets/image/mlp-net.png" align="center" width="200px"/>
+<span><strong>Figure 2 - Net structure of a MLP model.</strong></span>
+</div>
+
+Feed-forward models, e.g., CNN and MLP, can easily get configured as their layer
+connections are undirected without circles. The
+configuration for the MLP model shown in Figure 1 is as follows,
+
+    net {
+      layer {
+        name : 'data"
+        type : kData
+      }
+      layer {
+        name : 'image"
+        type : kImage
+        srclayer: 'data'
+      }
+      layer {
+        name : 'label"
+        type : kLabel
+        srclayer: 'data'
+      }
+      layer {
+        name : 'hidden"
+        type : kHidden
+        srclayer: 'image'
+      }
+      layer {
+        name : 'softmax"
+        type : kSoftmaxLoss
+        srclayer: 'hidden'
+        srclayer: 'label'
+      }
+    }
+
+### Energy models
+
+<img src="http://singa.incubator.apache.org/assets/image/rbm-rnn.png" align="center" width="500px"/>
+<span><strong>Figure 3 - Convert connections in RBM and RNN.</strong></span>
+
+
+For energy models including RBM, DBM,
+etc., their connections are undirected (i.e., Category B). To represent these models using
+`NeuralNet`, users can simply replace each connection with two directed
+connections, as shown in Figure 3a. In other words, for each pair of connected layers, their source
+layer field should include each other's name.
+The full [RBM example](http://singa.incubator.apache.org/docs/rbm) has
+detailed neural net configuration for a RBM model, which looks like
+
+    net {
+      layer {
+        name : "vis"
+        type : kVisLayer
+        param {
+          name : "w1"
+        }
+        srclayer: "hid"
+      }
+      layer {
+        name : "hid"
+        type : kHidLayer
+        param {
+          name : "w2"
+          share_from: "w1"
+        }
+        srclayer: "vis"
+      }
+    }
+
+### RNN models
+
+For recurrent neural networks (RNN), users can remove the recurrent connections
+by unrolling the recurrent layer.  For example, in Figure 3b, the original
+layer is unrolled into a new layer with 4 internal layers. In this way, the
+model is like a normal feed-forward model, thus can be configured similarly.
+The [RNN example](http://singa.incubator.apache.org/docs/rnn}) has a full neural net
+configuration for a RNN model.
+
+
+## Configuration for multiple nets
+
+Typically, a training job includes three neural nets for
+training, validation and test phase respectively. The three neural nets share most
+layers except the data layer, loss layer or output layer, etc..  To avoid
+redundant configurations for the shared layers, users can uses the `exclude`
+filed to filter a layer in the neural net, e.g., the following layer will be
+filtered when creating the testing `NeuralNet`.
+
+
+    layer {
+      ...
+      exclude : kTest # filter this layer for creating test net
+    }
+
+
+
+## Neural net partitioning
+
+A neural net can be partitioned in different ways to distribute the training
+over multiple workers.
+
+### Batch and feature dimension
+
+<img src="http://singa.incubator.apache.org/assets/image/partition_fc.png" align="center" width="400px"/>
+<span><strong>Figure 4 - Partitioning of a fully connected layer.</strong></span>
+
+
+Every layer's feature blob is considered a matrix whose rows are feature
+vectors. Thus, one layer can be split on two dimensions. Partitioning on
+dimension 0 (also called batch dimension) slices the feature matrix by rows.
+For instance, if the mini-batch size is 256 and the layer is partitioned into 2
+sub-layers, each sub-layer would have 128 feature vectors in its feature blob.
+Partitioning on this dimension has no effect on the parameters, as every
+[Param](http://singa.incubator.apache.org/docs/param) object is replicated in the sub-layers. Partitioning on dimension
+1 (also called feature dimension) slices the feature matrix by columns. For
+example, suppose the original feature vector has 50 units, after partitioning
+into 2 sub-layers, each sub-layer would have 25 units. This partitioning may
+result in [Param](http://singa.incubator.apache.org/docs/param) object being split, as shown in
+Figure 4. Both the bias vector and weight matrix are
+partitioned into two sub-layers.
+
+
+### Partitioning configuration
+
+There are 4 partitioning schemes, whose configurations are give below,
+
+  1. Partitioning each singe layer into sub-layers on batch dimension (see
+  below). It is enabled by configuring the partition dimension of the layer to
+  0, e.g.,
+
+          # with other fields omitted
+          layer {
+            partition_dim: 0
+          }
+
+  2. Partitioning each singe layer into sub-layers on feature dimension (see
+  below).  It is enabled by configuring the partition dimension of the layer to
+  1, e.g.,
+
+          # with other fields omitted
+          layer {
+            partition_dim: 1
+          }
+
+  3. Partitioning all layers into different subsets. It is enabled by
+  configuring the location ID of a layer, e.g.,
+
+          # with other fields omitted
+          layer {
+            location: 1
+          }
+          layer {
+            location: 0
+          }
+
+
+  4. Hybrid partitioning of strategy 1, 2 and 3. The hybrid partitioning is
+  useful for large models. An example application is to implement the
+  [idea proposed by Alex](http://arxiv.org/abs/1404.5997).
+  Hybrid partitioning is configured like,
+
+          # with other fields omitted
+          layer {
+            location: 1
+          }
+          layer {
+            location: 0
+          }
+          layer {
+            partition_dim: 0
+            location: 0
+          }
+          layer {
+            partition_dim: 1
+            location: 0
+          }
+
+Currently SINGA supports strategy-2 well. Other partitioning strategies are
+are under test and will be released in later version.
+
+## Parameter sharing
+
+Parameters can be shared in two cases,
+
+  * sharing parameters among layers via user configuration. For example, the
+  visible layer and hidden layer of a RBM shares the weight matrix, which is configured through
+  the `share_from` field as shown in the above RBM configuration. The
+  configurations must be the same (except name) for shared parameters.
+
+  * due to neural net partitioning, some `Param` objects are replicated into
+  different workers, e.g., partitioning one layer on batch dimension. These
+  workers share parameter values. SINGA controls this kind of parameter
+  sharing automatically, users do not need to do any configuration.
+
+  * the `NeuralNet` for training and testing (and validation) share most layers
+  , thus share `Param` values.
+
+If the shared `Param` instances resident in the same process (may in different
+threads), they use the same chunk of memory space for their values. But they
+would have different memory spaces for their gradients. In fact, their
+gradients will be averaged by the [stub]() or [server]().
+
+
+{% comment %}
+## Advanced user guide
+
+### Creation
+
+    static shared_ptr<NeuralNet> NeuralNet::Create(const NetProto& np, Phase phase, int num);
+
+The above function creates a `NeuralNet` for a given phase, and returns a
+shared pointer to the `NeuralNet` instance. The phase is in {kTrain,
+kValidation, kTest}. `num` is used for net partitioning which indicates the
+number of partitions.  Typically, a training job includes three neural nets for
+training, validation and test phase respectively. The three neural nets share most
+layers except the data layer, loss layer or output layer, etc.. The `Create`
+function takes in the full net configuration including layers for training,
+validation and test.  It removes layers for phases other than the specified
+phase based on the `exclude` field in
+[layer configuration](http://singa.incubator.apache.org/docs/layer):
+
+    layer {
+      ...
+      exclude : kTest # filter this layer for creating test net
+    }
+
+The filtered net configuration is passed to the constructor of `NeuralNet`:
+
+    NeuralNet::NeuralNet(NetProto netproto, int npartitions);
+
+The constructor creates a graph representing the net structure firstly in
+
+    Graph* NeuralNet::CreateGraph(const NetProto& netproto, int npartitions);
+
+Next, it creates a layer for each node and connects layers if their nodes are
+connected.
+
+    void NeuralNet::CreateNetFromGraph(Graph* graph, int npartitions);
+
+Since the `NeuralNet` instance may be shared among multiple workers, the
+`Create` function returns a shared pointer to the `NeuralNet` instance .
+
+### Parameter sharing
+
+ `Param` sharing
+is enabled by first sharing the Param configuration (in `NeuralNet::Create`)
+to create two similar (e.g., the same shape) Param objects, and then calling
+(in `NeuralNet::CreateNetFromGraph`),
+
+    void Param::ShareFrom(const Param& from);
+
+It is also possible to share `Param`s of two nets, e.g., sharing parameters of
+the training net and the test net,
+
+    void NeuralNet:ShareParamsFrom(shared_ptr<NeuralNet> other);
+
+It will call `Param::ShareFrom` for each Param object.
+
+### Access functions
+`NeuralNet` provides a couple of access function to get the layers and params
+of the net:
+
+    const std::vector<Layer*>& layers() const;
+    const std::vector<Param*>& params() const ;
+    Layer* name2layer(string name) const;
+    Param* paramid2param(int id) const;
+
+
+### Partitioning
+
+
+#### Implementation
+
+SINGA partitions the neural net in `CreateGraph` function, which creates one
+node for each (partitioned) layer. For example, if one layer's partition
+dimension is 0 or 1, then it creates `npartition` nodes for it; if the
+partition dimension is -1, a single node is created, i.e., no partitioning.
+Each node is assigned a partition (or location) ID. If the original layer is
+configured with a location ID, then the ID is assigned to each newly created node.
+These nodes are connected according to the connections of the original layers.
+Some connection layers will be added automatically.
+For instance, if two connected sub-layers are located at two
+different workers, then a pair of bridge layers is inserted to transfer the
+feature (and gradient) blob between them. When two layers are partitioned on
+different dimensions, a concatenation layer which concatenates feature rows (or
+columns) and a slice layer which slices feature rows (or columns) would be
+inserted. These connection layers help making the network communication and
+synchronization transparent to the users.
+
+#### Dispatching partitions to workers
+
+Each (partitioned) layer is assigned a location ID, based on which it is dispatched to one
+worker. Particularly, the shared pointer to the `NeuralNet` instance is passed
+to every worker within the same group, but each worker only computes over the
+layers that have the same partition (or location) ID as the worker's ID.  When
+every worker computes the gradients of the entire model parameters
+(strategy-2), we refer to this process as data parallelism.  When different
+workers compute the gradients of different parameters (strategy-3 or
+strategy-1), we call this process model parallelism.  The hybrid partitioning
+leads to hybrid parallelism where some workers compute the gradients of the
+same subset of model parameters while other workers compute on different model
+parameters.  For example, to implement the hybrid parallelism in for the
+[DCNN model](http://arxiv.org/abs/1404.5997), we set `partition_dim = 0` for
+lower layers and `partition_dim = 1` for higher layers.
+
+{% endcomment %}

Added: incubator/singa/site/trunk/content/markdown/docs/overview.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/overview.md?rev=1700722&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/overview.md (added)
+++ incubator/singa/site/trunk/content/markdown/docs/overview.md Wed Sep  2 07:59:20 2015
@@ -0,0 +1,105 @@
+---
+layout: post
+title: Introduction
+category: docs
+---
+{% include JB/setup %}
+
+SINGA is a general distributed deep learning platform for training big deep
+learning models over large datasets. It is designed with an intuitive
+programming model based on the layer abstraction. A variety
+of popular deep learning models are supported, namely feed-forward models including
+convolutional neural networks (CNN), energy models like restricted Boltzmann
+machine (RBM), and recurrent neural networks (RNN). Many built-in layers are
+provided for users. SINGA architecture is
+sufficiently flexible to run synchronous, asynchronous and hybrid training
+frameworks.  SINGA
+also supports different neural net partitioning schemes to parallelize the
+training of large models, namely partitioning on batch dimension, feature
+dimension or hybrid partitioning.
+
+
+## Goals
+
+As a distributed system, the first goal of SINGA is to have good scalability. In other
+words, SINGA is expected to reduce the total training time to achieve certain
+accuracy with more computing resources (i.e., machines).
+
+
+The second goal is to make SINGA easy to use.
+It is non-trivial for programmers to develop and train models with deep and
+complex model structures.  Distributed training further increases the burden of
+programmers, e.g., data and model partitioning, and network communication.  Hence it is essential to
+provide an easy to use programming model so that users can implement their deep
+learning models/algorithms without much awareness of the underlying distributed
+platform.
+
+## Principles
+
+Scalability is a challenging research problem for distributed deep learning
+training. SINGA provides a general architecture to exploit the scalability of
+different training frameworks. Synchronous training frameworks improve the
+efficiency of one training iteration, and
+asynchronous training frameworks improve the convergence rate. Given a fixed budget
+(e.g., cluster size), users can run a hybrid framework that maximizes the
+scalability by trading off between efficiency and convergence rate.
+
+SINGA comes with a programming model designed based on the layer abstraction, which
+is intuitive for deep learning models.  A variety of
+popular deep learning models can be expressed and trained using this programming model.
+
+
+{% comment %}
+consists of multiple layers.  Each layer is associated with a feature
+transformation
+function. After going through all layers, the raw input feature (e.g., pixels
+of images) would be converted into a high-level feature that is easier for
+tasks like classification.
+{% endcomment %}
+
+## System overview
+
+<img src="http://singa.incubator.apache.org/assets/image/sgd.png" align="center" width="400px"/>
+<span><strong>Figure 1 - SGD flow.</strong></span>
+
+Training a deep learning model is to find the optimal parameters involved in
+the transformation functions that generate good features for specific tasks.
+The goodness of a set of parameters is measured by a loss function, e.g.,
+[Cross-Entropy Loss](https://en.wikipedia.org/wiki/Cross_entropy). Since the
+loss functions are usually non-linear and non-convex, it is difficult to get a
+closed form solution. Typically, people use the stochastic gradient descent
+(SGD) algorithm, which randomly
+initializes the parameters and then iteratively updates them to reduce the loss
+as shown in Figure 1.
+
+<img src="http://singa.incubator.apache.org/assets/image/overview.png" align="center" width="400px"/>
+<span><strong>Figure 2 - SINGA overview.</strong></span>
+
+SGD is used in SINGA to train
+parameters of deep learning models. The training workload is distributed over
+worker and server units as shown in Figure 2. In each
+iteration, every worker calls *TrainOneBatch* function to compute
+parameter gradients. *TrainOneBatch* takes a *NeuralNet* object
+representing the neural net, and visits layers of the *NeuralNet* in
+certain order. The resultant gradients are sent to the local stub that
+aggregates the requests and forwards them to corresponding servers for
+updating. Servers reply to workers with the updated parameters for the next
+iteration.
+
+
+## Job submission
+
+To submit a job in SINGA (i.e., training a deep learning model),
+users pass the job configuration to SINGA driver in the
+[main function](http://singa.incubator.apache.org/docs/programming-guide). The job configuration
+specifies the four major components in Figure 2,
+
+  * a [NeuralNet](http://singa.incubator.apache.org/docs/neural-net) describing the neural net structure with the detailed layer setting and their connections;
+  * a [TrainOneBatch](http://singa.incubator.apache.org/docs/train-one-batch) algorithm which is tailored for different model categories;
+  * an [Updater](http://singa.incubator.apache.org/docs/updater) defining the protocol for updating parameters at the server side;
+  * a [Cluster Topology](http://singa.incubator.apache.org/docs/distributed-training) specifying the distributed architecture of workers and servers.
+
+This process is like the job submission in Hadoop, where users configure their
+jobs in the main function to set the mapper, reducer, etc.
+In Hadoop, users can configure their jobs with their own (or built-in) mapper and reducer; in SINGA, users
+can configure their jobs with their own (or built-in) layer, updater, etc.

Added: incubator/singa/site/trunk/content/markdown/docs/param.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/param.md?rev=1700722&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/param.md (added)
+++ incubator/singa/site/trunk/content/markdown/docs/param.md Wed Sep  2 07:59:20 2015
@@ -0,0 +1,230 @@
+---
+layout: post
+title: Param
+category : docs
+tags : [parameter ]
+---
+{% include JB/setup %}
+
+A `Param` object in SINGA represents a set of parameters, e.g., a weight matrix
+or a bias vector. *Basic user guide* describes how to configure for a `Param`
+object, and *Advanced user guide* provides details on implementing users'
+parameter initialization methods.
+
+## Basic user guide
+
+The configuration of a Param object is inside a layer configuration, as the
+`Param` are associated with layers. An example configuration is like
+
+    layer {
+      ...
+      param {
+        name : "p1"
+        init {
+          type : kConstant
+          value: 1
+        }
+      }
+    }
+
+The [SGD algorithm](http://singa.incubator.apache.org/docs/overview) starts with initializing all
+parameters according to user specified initialization method (the `init` field).
+For the above example,
+all parameters in `Param` "p1" will be initialized to constant value 1. The
+configuration fields of a Param object is defined in [ParamProto](http://singa.incubator.apache.org/api/classsinga_1_1ParamProto.html):
+
+  * name, an identifier string. It is an optional field. If not provided, SINGA
+  will generate one based on layer name and its order in the layer.
+  * init, field for setting initialization methods.
+  * share_from, name of another `Param` object, from which this `Param` will share
+  configurations and values.
+  * lr_scale, float value to be multiplied with the learning rate when
+  [updating the parameters](http://singa.incubator.apache.org/docs/updater)
+  * wd_scale, float value to be multiplied with the weight decay when
+  [updating the parameters](http://singa.incubator.apache.org/docs/updater)
+
+There are some other fields that are specific to initialization methods.
+
+### Initialization methods
+
+Users can set the `type` of `init` use the following built-in initialization
+methods,
+
+  * `kConst`, set all parameters of the Param object to a constant value
+
+        type: kConst
+        value: float  # default is 1
+
+  * `kGaussian`, initialize the parameters following a Gaussian distribution.
+
+        type: kGaussian
+        mean: float # mean of the Gaussian distribution, default is 0
+        std: float # standard variance, default is 1
+        value: float # default 0
+
+  * `kUniform`, initialize the parameters following an uniform distribution
+
+        type: kUniform
+        low: float # lower boundary, default is -1
+        high: float # upper boundary, default is 1
+        value: float # default 0
+
+  * `kGaussianSqrtFanIn`, initialize `Param` objects with two dimensions (i.e.,
+  matrix) using `kGaussian` and then
+  multiple each parameter with `1/sqrt(fan_in)`, where`fan_in` is the number of
+  columns of the matrix.
+
+  * `kUniformSqrtFanIn`, the same as `kGaussianSqrtFanIn` except that the
+  distribution is an uniform distribution.
+
+  * `kUniformFanInOut`, initialize matrix `Param` objects using `kUniform` and then
+  multiple each parameter with `sqrt(6/(fan_in + fan_out))`, where`fan_in +
+  fan_out` sums up the number of columns and rows of the matrix.
+
+For all above initialization methods except `kConst`, if their `value` is not
+1, every parameter will be multiplied with `value`. Users can also implement
+their own initialization method following the *Advanced user guide*.
+
+
+## Advanced user guide
+
+This sections describes the details on implementing new parameter
+initialization methods.
+
+### Base ParamGenerator
+All initialization methods are implemented as
+subclasses of the base `ParamGenerator` class.
+
+    class ParamGenerator {
+     public:
+      virtual void Init(const ParamGenProto&);
+      void Fill(Param*);
+
+     protected:
+      ParamGenProto proto_;
+    };
+
+Configurations of the initialization method is in `ParamGenProto`. The `Fill`
+function fills the `Param` object (passed in as an argument).
+
+### New ParamGenerator subclass
+
+Similar to implement a new Layer subclass, users can define a configuration
+protocol message,
+
+    # in user.proto
+    message FooParamProto {
+      optional int32 x = 1;
+    }
+    extend ParamGenProto {
+      optional FooParamProto fooparam_conf =101;
+    }
+
+The configuration of `Param` would be
+
+    param {
+      ...
+      init {
+        user_type: 'FooParam" # must use user_type for user defined methods
+        [fooparam_conf] { # must use brackets for configuring user defined messages
+          x: 10
+        }
+      }
+    }
+
+The subclass could be declared as,
+
+    class FooParamGen : public ParamGenerator {
+     public:
+      void Fill(Param*) override;
+    };
+
+Users can access the configuration fields in `Fill` by
+
+    int x = proto_.GetExtension(fooparam_conf).x();
+
+To use the new initialization method, users need to register it in the
+[main function](http://singa.incubator.apache.org/docs/programming-guide).
+
+    driver.RegisterParamGenerator<FooParamGen>("FooParam")  # must be consistent with the user_type in configuration
+
+{% comment %}
+### Base Param class
+
+### Members
+
+    int local_version_;
+    int slice_start_;
+    vector<int> slice_offset_, slice_size_;
+
+    shared_ptr<Blob<float>> data_;
+    Blob<float> grad_;
+    ParamProto proto_;
+
+Each Param object has a local version and a global version (inside the data
+Blob). These two versions are used for synchronization. If multiple Param
+objects share the same values, they would have the same `data_` field.
+Consequently, their global version is the same. The global version is updated
+by [the stub thread](http://singa.incubator.apache.org/docs/communication). The local version is
+updated in `Worker::Update` function which assigns the global version to the
+local version. The `Worker::Collect` function is blocked until the global
+version is larger than the local version, i.e., when `data_` is updated. In
+this way, we synchronize workers sharing parameters.
+
+In Deep learning models, some Param objects are 100 times larger than others.
+To ensure the load-balance among servers, SINGA slices large Param objects. The
+slicing information is recorded by `slice_*`. Each slice is assigned a unique
+ID starting from 0. `slice_start_` is the ID of the first slice of this Param
+object. `slice_offset_[i]` is the offset of the i-th slice in this Param
+object. `slice_size_[i]` is the size of the i-th slice. These slice information
+is used to create messages for transferring parameter values or gradients to
+different servers.
+
+Each Param object has a `grad_` field for gradients. Param objects do not share
+this Blob although they may share `data_`.  Because each layer containing a
+Param object would contribute gradients. E.g., in RNN, the recurrent layers
+share parameters values, and the gradients used for updating are averaged from all recurrent
+these recurrent layers. In SINGA, the [stub thread] will aggregate local
+gradients for the same Param object. The server will do a global aggregation
+of gradients for the same Param object.
+
+The `proto_` field has some meta information, e.g., name and ID. It also has a
+field called `owner` which is the ID of the Param object that shares parameter
+values with others.
+
+### Functions
+The base Param class implements two sets of functions,
+
+    virtual void InitValues(int version = 0);  // initialize values according to `init_method`
+    void ShareFrom(const Param& other);  // share `data_` from `other` Param
+    --------------
+    virtual Msg* GenGetMsg(bool copy, int slice_idx);
+    virtual Msg* GenPutMsg(bool copy, int slice_idx);
+    ... // other message related functions.
+
+Besides the functions for processing the parameter values, there is a set of
+functions for generating and parsing messages. These messages are for
+transferring parameter values or gradients between workers and servers. Each
+message corresponds to one Param slice. If `copy` is false, it means the
+receiver of this message is in the same process as the sender. In such case,
+only pointers to the memory of parameter value (or gradient) are wrapped in
+the message; otherwise, the parameter values (or gradients) should be copied
+into the message.
+
+
+## Implementing Param subclass
+Users can extend the base Param class to implement their own parameter
+initialization methods and message transferring protocols. Similar to
+implementing a new Layer subclasses, users can create google protocol buffer
+messages for configuring the Param subclass. The subclass, denoted as FooParam
+should be registered in main.cc,
+
+    dirver.RegisterParam<FooParam>(kFooParam);  // kFooParam should be different to 0, which is for the base Param type
+
+
+  * type, an integer representing the `Param` type. Currently SINGA provides one
+    `Param` implementation with type 0 (the default type). If users want
+    to use their own Param implementation, they should extend the base Param
+    class and configure this field with `kUserParam`
+
+{% endcomment %}

Added: incubator/singa/site/trunk/content/markdown/docs/programming-guide.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/programming-guide.md?rev=1700722&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/programming-guide.md (added)
+++ incubator/singa/site/trunk/content/markdown/docs/programming-guide.md Wed Sep  2 07:59:20 2015
@@ -0,0 +1,100 @@
+---
+layout: post
+title: Programming Guide
+category : docs
+tags : [programming]
+---
+{% include JB/setup %}
+
+
+To submit a training job, users must provide the configuration of the
+four components shown in Figure 1:
+
+  * a [NeuralNet](http://singa.incubator.apache.org/docs/neural-net) describing the neural net structure with the detailed layer setting and their connections;
+  * a [TrainOneBatch](http://singa.incubator.apache.org/docs/train-one-batch) algorithm which is tailored for different model categories;
+  * an [Updater](http://singa.incubator.apache.org/docs/updater) defining the protocol for updating parameters at the server side;
+  * a [Cluster Topology](http://singa.incubator.apache.org/docs/distributed-training) specifying the distributed architecture of workers and servers.
+
+The *Basic user guide* section describes how to submit a training job using
+built-in components; while the *Advanced user guide* section presents details
+on writing user's own main function to register components implemented by
+themselves. In addition, the training data must be prepared, which has the same
+[process](http://singa.incubator.apache.org/docs/data) for both advanced users and basic users.
+
+<img src="http://singa.incubator.apache.org/assets/image/overview.png" align="center" width="400px"/>
+<span><strong>Figure 1 - SINGA overview.</strong></span>
+
+
+
+## Basic user guide
+
+Users can use the default main function provided SINGA to submit the training
+job. For this case, a job configuration file written as a google protocol
+buffer message for the [JobProto](http://singa.incubator.apache.org/api/classsinga_1_1JobProto.html) must be provided in the command line,
+
+    ./bin/singa-run.sh -conf <path to job conf> [-resume]
+
+`-resume` is for continuing the training from last
+[checkpoint](http://singa.incubator.apache.org/docs/checkpoint).
+The [MLP](http://singa.incubator.apache.org/docs/mlp) and [CNN](http://singa.incubator.apache.org/docs/cnn)
+examples use built-in components. Please read the corresponding pages for their
+job configuration files. The subsequent pages will illustrate the details on
+each component of the configuration.
+
+## Advanced user guide
+
+If a user's model contains some user-defined components, e.g.,
+[Updater](http://singa.incubator.apache.org/docs/updater), he has to write a main function to
+register these components. It is similar to Hadoop's main function. Generally,
+the main function should
+
+  * initialize SINGA, e.g., setup logging.
+
+  * register user-defined components.
+
+  * create and pass the job configuration to SINGA driver
+
+
+An example main function is like
+
+    #include "singa.h"
+    #include "user.h"  // header for user code
+
+    int main(int argc, char** argv) {
+      singa::Driver driver;
+      driver.Init(argc, argv);
+      bool resume;
+      // parse resume option from argv.
+
+      // register user defined layers
+      driver.RegisterLayer<FooLayer>(kFooLayer);
+      // register user defined updater
+      driver.RegisterUpdater<FooUpdater>(kFooUpdater);
+      ...
+      auto jobConf = driver.job_conf();
+      //  update jobConf
+
+      driver.Submit(resume, jobConf);
+      return 0;
+    }
+
+The Driver class' `Init` method will load a job configuration file provided by
+users as a command line argument (`-conf <job conf>`). It contains at least the
+cluster topology and returns the `jobConf` for users to update or fill in
+configurations of neural net, updater, etc. If users define subclasses of
+Layer, Updater, Worker and Param, they should register them through the driver.
+Finally, the job configuration is submitted to the driver which starts the
+training.
+
+We will provide helper functions to make the configuration easier in the
+future, like [keras](https://github.com/fchollet/keras).
+
+Users need to compile and link their code (e.g., layer implementations and the main
+file) with SINGA library (*.libs/libsinga.so*) to generate an
+executable file, e.g., with name *mysinga*.  To launch the program, users just pass the
+path of the *mysinga* and base job configuration to *./bin/singa-run.sh*.
+
+    ./bin/singa-run.sh -conf <path to job conf> -exec <path to mysinga> [other arguments]
+
+The [RNN application](http://singa.incubator.apache.org/docs/rnn) provides a full example of
+implementing the main function for training a specific RNN model.

Added: incubator/singa/site/trunk/content/markdown/docs/quick-start.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/quick-start.md?rev=1700722&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/quick-start.md (added)
+++ incubator/singa/site/trunk/content/markdown/docs/quick-start.md Wed Sep  2 07:59:20 2015
@@ -0,0 +1,201 @@
+---
+layout: post
+title: Quick Start
+category : docs
+tags : [installation, examples]
+---
+{% include JB/setup %}
+
+## SINGA setup
+
+Please refer to the
+[installation](http://singa.incubator.apache.org/docs/installation}) page
+for guidance on installing SINGA.
+
+### Starting Zookeeper
+
+SINGA uses [zookeeper](https://zookeeper.apache.org/) to coordinate the
+training.  Please make sure the zookeeper service is started before running
+SINGA.
+
+If you installed the zookeeper using our thirdparty script, you can
+simply start it by:
+
+    #goto top level folder
+    cd  SINGA_ROOT
+    ./bin/zk-service start
+
+(`./bin/zk-service stop` stops the zookeeper).
+
+Otherwise, if you launched a zookeeper by yourself but not used the
+default port, please edit the `conf/singa.conf`:
+
+    zookeeper_host: "localhost:YOUR_PORT"
+
+## Running in standalone mode
+
+Running SINGA in standalone mode is on the contrary of running it using cluster
+managers like [Mesos](http://mesos.apache.org/) or [YARN](http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html).
+
+{% comment %}
+For standalone mode, users have to manage the resources manually. For
+instance, they have to prepare a host file containing all running nodes.
+There is no restriction on CPU and memory resources, hence SINGA consumes as much
+CPU and memory resources as it needs.
+{% endcomment %}
+
+### Training on a single node
+
+For single node training, one process will be launched to run SINGA at
+local host. We train the [CNN model](http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks) over the
+[CIFAR-10](http://www.cs.toronto.edu/~kriz/cifar.html) dataset as an example.
+The hyper-parameters are set following
+[cuda-convnet](https://code.google.com/p/cuda-convnet/). More details is
+available at [CNN example](http://singa.incubator.apache.org/docs/cnn).
+
+
+#### Preparing data and job configuration
+
+Download the dataset and create the data shards for training and testing.
+
+    cd examples/cifar10/
+    make download
+    make create
+
+A training dataset and a test dataset are created under *cifar10-train-shard*
+and *cifar10-test-shard* folder respectively. An *image_mean.bin* file is also
+generated, which contains the feature mean of all images.
+
+Since all code used for training this CNN model is provided by SINGA as
+built-in implementation, there is no need to write any code. Instead, users just
+execute the running script (*../../bin/singa-run.sh*) by providing the job
+configuration file (*job.conf*). To code in SINGA, please refer to the
+[programming guide](http://singa.incubator.apache.org/docs/programming-guide).
+
+#### Training without parallelism
+
+By default, the cluster topology has a single worker and a single server.
+In other words, neither the training data nor the neural net is partitioned.
+
+The training is started by running:
+
+    # goto top level folder
+    cd ../../
+    ./bin/singa-run.sh -conf examples/cifar10/job.conf
+
+
+You can list the current running jobs by,
+
+    ./bin/singa-console.sh list
+
+    JOB ID    |NUM PROCS
+    ----------|-----------
+    24        |1
+
+Jobs can be killed by,
+
+    ./bin/singa-console.sh kill JOB_ID
+
+
+Logs and job information are available in */tmp/singa-log* folder, which can be
+changed to other folders by setting `log-dir` in *conf/singa.conf*.
+
+{% comment %}
+One worker group trains against one partition of the training dataset. If
+*nworker_groups* is set to 1, then there is no data partitioning. One worker
+runs over a partition of the model. If *nworkers_per_group* is set to 1, then
+there is no model partitioning. More details on the cluster configuration are
+described in the [System Architecture]() page.
+{% endcomment %}
+
+#### Asynchronous parallel training
+
+    # job.conf
+    ...
+    cluster {
+      nworker_groups: 2
+      nworkers_per_procs: 2
+      workspace: "examples/cifar10/"
+    }
+
+In SINGA, [asynchronous training](http://singa.incubator.apache.org/docs/architecture) is enabled
+by launching multiple worker groups. For example, we can change the original
+*job.conf* to have two worker groups as shown above. By default, each
+worker group has one worker. Since one process is set to contain two workers.
+The two worker groups will run in the same process.  Consequently, they run
+the in-memory [Downpour](http://singa.incubator.apache.org/docs/frameworks) training framework.
+Users do not need to split the dataset
+explicitly for each worker (group); instead, they can assign each worker (group) a
+random offset to the start of the dataset. The workers would run as on
+different data partitions.
+
+    # job.conf
+    ...
+    neuralnet {
+      layer {
+        ...
+        sharddata_conf {
+          random_skip: 5000
+        }
+      }
+      ...
+    }
+
+The running command is:
+
+    ./bin/singa-run.sh -conf examples/cifar10/job.conf
+
+#### Synchronous parallel training
+
+    # job.conf
+    ...
+    cluster {
+      nworkers_per_group: 2
+      nworkers_per_procs: 2
+      workspace: "examples/cifar10/"
+    }
+
+In SINGA, [asynchronous training](http://singa.incubator.apache.org/docs/architecture) is enabled
+by launching multiple workers within one worker group. For instance, we can
+change the original *job.conf* to have two workers in one worker group as shown
+above. The workers will run synchronously
+as they are from the same worker group. This framework is the in-memory
+[sandblaster](http://singa.incubator.apache.org/docs/frameworks).
+The model is partitioned among the two workers. In specific, each layer is
+sliced over the two workers.  The sliced layer
+is the same as the original layer except that it only has `B/g` feature
+instances, where `B` is the number of instances in a mini-batch, `g` is the number of
+workers in a group. It is also possible to partition the layer (or neural net)
+using [other schemes](http://singa.incubator.apache.org/docs/neural-net).
+All other settings are the same as running without partitioning
+
+    ./bin/singa-run.sh -conf examples/cifar10/job.conf
+
+### Training in a cluster
+
+We can extend the above two training frameworks to a cluster by updating the
+cluster configuration with:
+
+    nworker_per_procs: 1
+
+Every process would then create only one worker thread. Consequently, the workers
+would be created in different processes (i.e., nodes). The *hostfile*
+must be provided under *SINGA_ROOT/conf/* specifying the nodes in the cluster,
+e.g.,
+
+    logbase-a01
+    logbase-a02
+
+The running command is the same as for single node training:
+
+    ./bin/singa-run.sh -conf examples/cifar10/job.conf
+
+## Running with Mesos
+
+*in working*...
+
+
+## Where to go next
+
+The [programming guide](http://singa.incubator.apache.org/docs/programming-guide) pages will
+describe how to submit a training job in SINGA.

Modified: incubator/singa/site/trunk/content/markdown/docs/rbm.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/rbm.md?rev=1700722&r1=1700721&r2=1700722&view=diff
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/rbm.md (original)
+++ incubator/singa/site/trunk/content/markdown/docs/rbm.md Wed Sep  2 07:59:20 2015
@@ -0,0 +1,398 @@
+---
+layout: post
+title: Example-Restricted Boltzmann Machine
+category : docs
+tags : [rbm, example]
+---
+    {% include JB/setup %}
+
+This example uses SINGA to train 4 RBM models and one auto-encoder model over the
+[MNIST dataset](http://yann.lecun.com/exdb/mnist/). The auto-encoder model is trained
+to reduce the dimensionality of the MNIST image feature. The RBM models are trained
+to initialize parameters of the auto-encoder model. This example application is
+from [Hinton's science paper](http://www.cs.toronto.edu/~hinton/science.pdf).
+
+## Running instructions
+
+Running scripts are provided in *SINGA_ROOT/examples/rbm* folder.
+
+The MNIST dataset has 70,000 handwritten digit images. The
+[data preparation](http://singa.incubator.apache.org/docs/data) page
+has details on converting this dataset into SINGA recognizable format (i.e.,
+[DataShard](http://singa.incubator.apache.org/api/classsinga_1_1DataShard.html)). Users can
+simply run the following commands to download and convert the dataset.
+
+    # at SINGA_ROOT/examples/rbm/
+    $ cp Makefile.example Makefile
+    $ make download
+    $ make create
+
+The training is separated into two phases, namely pre-training and fine-tuning.
+The pre-training phase trains 4 RBMs in sequence,
+
+    # at SINGA_ROOT/
+    $ ./bin/singa-run.sh -conf examples/rbm/rbm0.conf
+    $ ./bin/singa-run.sh -conf examples/rbm/rbm1.conf
+    $ ./bin/singa-run.sh -conf examples/rbm/rbm2.conf
+    $ ./bin/singa-run.sh -conf examples/rbm/rbm3.conf
+
+The fine-tuning phase trains the auto-encoder by,
+
+    $ ./bin/singa-run.sh -conf examples/rbm/autoencoder.conf
+
+
+## Training details
+
+### RBM0
+
+<img src="http://singa.incubator.apache.org/assets/image/RBM0_new.PNG" align="center" width="200px"/>
+<span><strong>Figure 1 - RBM0.</strong></span>
+
+The neural net structure for training RBM0 is shown in Figure 1.
+The data layer and parser layer provides features for training RBM0.
+The visible layer (connected with parser layer) of RBM0 accepts the image feature
+(784 dimension). The hidden layer is set to have 1000 neurons (units).
+These two layers are configured as,
+
+    layer{
+      name: "RBMVis"
+      type: kRBMVis
+      srclayers:"mnist"
+      srclayers:"RBMHid"
+      rbmvis_conf{
+        num_output: 1000
+      }
+      param{
+        name: "w0"
+        init{
+          type: kGaussian
+          mean: 0.0
+          std: 0.1
+        }
+      }
+      param{
+        name: "b0"
+        init{
+          type: kConstant
+          value: 0.0
+        }
+      }
+    }
+
+    layer{
+      name: "RBMHid"
+      type: kRBMHid
+      srclayers:"RBMVis"
+      rbmhid_conf{
+        hid_dim: 1000
+      }
+      param{
+        name: "w0_"
+        share_from: "w0"
+      }
+      param{
+        name: "b1"
+        init{
+          type: kConstant
+          value: 0.0
+        }
+      }
+    }
+
+
+
+For RBM, the weight matrix is shared by the visible and hidden layers. For instance,
+`w0` is shared by `vis` and `hid` layers shown in Figure 1. In SINGA, we can configure
+the `share_from` field to enable [parameter sharing](http://singa.incubator.apache.org/docs/param)
+as shown above for the param `w0` and `w0_`.
+
+[Contrastive Divergence](http://singa.incubator.apache.org/docs/train-one-batch/#contrastive-divergence)
+is configured as the algorithm for [TrainOneBatch](http://singa.incubator.apache.org/docs/train-one-batch).
+Following Hinton's paper, we configure the [updating protocol](http://singa.incubator.apache.org/docs/updater/)
+as follows,
+
+    # Updater Configuration
+    updater{
+      type: kSGD
+      momentum: 0.9
+      weight_decay: 0.0002
+      learning_rate{
+        base_lr: 0.1
+        type: kFixed
+      }
+    }
+
+Since the parameters of RBM0 will be used to initialize the auto-encoder, we should
+configure the `workspace` field to specify a path for the checkpoint folder.
+For example, if we configure it as,
+
+    cluster {
+      workspace: "SINGA_ROOT/rbm0/"
+    }
+
+Then SINGA will [checkpoint the parameters](http://singa.incubator.apache.org/docs/checkpoint) into *SINGA_ROOT/rbm0/*.
+
+### RBM1
+<img src="http://singa.incubator.apache.org/assets/image/RBM1_new.PNG" align="center" width="200px"/>
+<span><strong>Figure 2 - RBM1.</strong></span>
+
+Figure 2 shows the net structure of training RBM1.
+The visible units of RBM1 accept the output from the Sigmoid1 layer. The Inner1 layer
+is a  `InnerProductLayer` whose parameters are set to the `w0` and `b1` learned
+from RBM0.
+The neural net configuration is (with layers for data layer and parser layer omitted).
+
+    layer{
+      name: "Inner1"
+      type: kInnerProduct
+      srclayers:"mnist"
+      innerproduct_conf{
+        num_output: 1000
+      }
+      param{
+        name: "w0"
+      }
+      param{
+        name: "b1"
+      }
+    }
+
+    layer{
+      name: "Sigmoid1"
+      type: kSigmoid
+      srclayers:"Inner1"
+    }
+
+    layer{
+      name: "RBMVis"
+      type: kRBMVis
+      srclayers:"sigmoid1"
+      srclayers:"RBMHid"
+      rbmvis_conf{
+        num_output: 500
+      }
+      param{
+        name: "w1"
+        init{
+        type: kGaussian
+        mean: 0.0
+        std: 0.1
+        }
+      }
+      param{
+        name: "b2"
+        init{
+        type: kConstant
+        value: 0.0
+        }
+      }
+    }
+
+    layer{
+      name: "RBMHid"
+      type: kRBMHid
+      srclayers:"RBMVis"
+      rbmhid_conf{
+        hid_dim: 500
+      }
+      param{
+        name: "w1_"
+        share_from: "w1"
+      }
+      param{
+        name: "b3"
+        init{
+        type: kConstant
+        value: 0.0
+        }
+      }
+    }
+
+To load w0 and b1 from RBM0's checkpoint file, we configure the `checkpoint_path` as,
+
+    checkpoint_path: "SINGA_ROOT/rbm0/checkpoint/step6000-worker0.bin"
+    cluster{
+      workspace: "SINGA_ROOT/rbm1"
+    }
+
+The workspace is changed for checkpointing w1, b2 and b3 into *SINGA_ROOT/rbm1/*.
+
+### RBM2
+
+<img src="http://singa.incubator.apache.org/assets/image/RBM2_new.PNG" align="center" width="200px"/>
+<span><strong>Figure 3 - RBM2.</strong></span>
+
+
+
+Figure 3 shows the net structure of training RBM2. In this model, a layer with
+250 units is added as the hidden layer of RBM2. The visible units of RBM2
+accepts output from Sigmoid2 layer. Parameters of Inner1 and inner2 are set to
+`w0,b1,w1,b2` which can be load from the checkpoint file of RBM1, i.e., "SINGA_ROOT/rbm1/".
+
+### RBM3
+
+
+<img src="http://singa.incubator.apache.org/assets/image/RBM3_new.PNG" align="center" width="200px"/>
+<span><strong>Figure 4 - RBM3.</strong></span>
+
+
+
+Figure 4 shows the net structure of training RBM3. It is similar to Figure 3,
+but according to [Hinton's science
+paper](http://www.cs.toronto.edu/~hinton/science.pdf), the hidden units of the
+top RBM (RBM3) have stochastic real-valued states drawn from a unit variance
+Gaussian whose mean is determined by the input from the RBM's logistic visible
+units. So we add a `gaussian` field in the RBMHid layer to control the
+sampling distribution (Gaussian or Bernoulli). In addition, this
+RBM has a much smaller learning rate (0.001).  The neural net configuration for
+the RBM3 and the updating protocol is (with layers for data layer and parser
+layer omitted),
+
+    # Updater Configuration
+    updater{
+      type: kSGD
+      momentum: 0.9
+      weight_decay: 0.0002
+      learning_rate{
+        base_lr: 0.001
+        type: kFixed
+      }
+    }
+
+    layer{
+      name: "RBMVis"
+      type: kRBMVis
+      srclayers:"sigmoid3"
+      srclayers:"RBMHid"
+      rbmvis_conf{
+        num_output: 30
+      }
+      param{
+        name: "w3"
+        init{
+        type: kGaussian
+        mean: 0.0
+        std: 0.1
+        }
+     }
+     param{
+       name: "b6"
+       init{
+       type: kConstant
+       value: 0.0
+       }
+     }
+    }
+
+    layer{
+      name: "RBMHid"
+      type: kRBMHid
+      srclayers:"RBMVis"
+      rbmhid_conf{
+        hid_dim: 30
+        gaussian: true
+      }
+      param{
+        name: "w3_"
+        share_from: "w3"
+      }
+      param{
+        name: "b7"
+        init{
+          type: kConstant
+          value: 0.0
+        }
+      }
+    }
+
+### Auto-encoder
+In the fine-tuning stage, the 4 RBMs are "unfolded" to form encoder and decoder
+networks that are initialized using the parameters from the previous 4 RBMs.
+
+<img src="http://singa.incubator.apache.org/assets/image/autoencoder_new.PNG" align="center" width="500px"/>
+<span><strong>Figure 5 - Auto-Encoder.</strong></span>
+
+
+Figure 5 shows the neural net structure for training the auto-encoder.
+[Back propagation (kBP)] (http://singa.incubator.apache.org/docs/train-one-batch/) is
+configured as the algorithm for `TrainOneBatch`. We use the same cluster
+configuration as RBM models. For updater, we use [AdaGrad](http://singa.incubator.apache.org/docs/updater#adagradupdater) algorithm with
+fixed learning rate.
+
+    ### Updater Configuration
+    updater{
+      type: kAdaGrad
+      learning_rate{
+      base_lr: 0.01
+      type: kFixed
+      }
+    }
+
+
+
+According to [Hinton's science paper](http://www.cs.toronto.edu/~hinton/science.pdf),
+we configure a EuclideanLoss layer to compute the reconstruction error. The neural net
+configuration is (with some of the middle layers omitted),
+
+    layer{ name: "data" }
+    layer{ name:"mnist" }
+    layer{
+      name: "Inner1"
+      param{ name: "w0" }
+      param{ name: "b1" }
+    }
+    layer{ name: "sigmoid1" }
+    ...
+    layer{
+      name: "Inner8"
+      innerproduct_conf{
+        num_output: 784
+        transpose: true
+      }
+      param{
+        name: "w8"
+        share_from: "w1"
+      }
+      param{ name: "b0" }
+    }
+    layer{ name: "sigmoid8" }
+    ### Euclidean Loss Layer Configuration
+    layer{
+      name: "loss"
+      type:kEuclideanLoss
+      srclayers:"sigmoid8"
+      srclayers:"mnist"
+    }
+
+To load pre-trained parameters from the 4 RBMs' checkpoint file we configure `checkpoint_path` as
+
+    ### Checkpoint Configuration
+    checkpoint_path: "examples/rbm/checkpoint/rbm0/checkpoint/step6000-worker0.bin"
+    checkpoint_path: "examples/rbm/checkpoint/rbm1/checkpoint/step6000-worker0.bin"
+    checkpoint_path: "examples/rbm/checkpoint/rbm2/checkpoint/step6000-worker0.bin"
+    checkpoint_path: "examples/rbm/checkpoint/rbm3/checkpoint/step6000-worker0.bin"
+
+
+## Visualization Results
+
+<div>
+<img src="http://singa.incubator.apache.org/assets/image/rbm-weight.PNG" align="center" width="300px"/>
+
+<img src="http://singa.incubator.apache.org/assets/image/rbm-feature.PNG" align="center" width="300px"/>
+<br/>
+<span><strong>Figure 6 - Bottom RBM weight matrix.</strong></span>
+&nbsp;
+&nbsp;
+&nbsp;
+&nbsp;
+
+<span><strong>Figure 7 - Top layer features.</strong></span>
+</div>
+
+Figure 6 visualizes sample columns of the weight matrix of RBM0, We can see the
+Gabor-like filters are learned. Figure 7 depicts the features extracted from
+the top-layer of the auto-encoder, wherein one point represents one image.
+Different colors represent different digits. We can see that most images are
+well clustered according to the ground truth.
+
+



Mime
View raw message