; Mon, 18 Jan 2016 07:44:36 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: svn commit: r977960 [2/3] - in /websites/staging/singa/trunk/content: ./ docs/zh/ Date: Mon, 18 Jan 2016 07:44:36 -0000 To: commits@singa.incubator.apache.org From: buildbot@apache.org X-Mailer: svnmailer-1.0.9 Message-Id: <20160118074436.D74AF3A0851@svn01-us-west.apache.org> Added: websites/staging/singa/trunk/content/docs/zh/installation_source.html ============================================================================== --- websites/staging/singa/trunk/content/docs/zh/installation_source.html (added) +++ websites/staging/singa/trunk/content/docs/zh/installation_source.html Mon Jan 18 07:44:36 2016 @@ -0,0 +1,608 @@ + + + + + + + + + Apache SINGA – ä»æºç¨åºå®è£SIGNA + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

+ + + + + + + + + +

+ + +

+ +

+ + +

+ + + +

+ +

+ + +

+ +

从源程序安装SIGNA

依赖

SINGA 在Linux平台上开发与测试。安装SINGA需要下拉列依赖库：

+ +

+
glog version 0.3.3
+
google-protobuf version 2.6.0
+
openblas version >= 0.2.10
+
zeromq version >= 3.2
+
czmq version >= 3
+
zookeeper version 3.4.6

可选依赖包括：

+ +

lmdb version 0.9.10

你可以使用下列命令将所有的依赖库安装到$PREFIX文件夹下：

+ +

# make sure you are in the thirdparty folder
+cd thirdparty
+./install.sh all $PREFIX
+

如果$PREFIX不是一个系统路径（如：/esr/local/），请在继续安装前使用下述命令导出相关变量：

+ +

export LD_LIBRARY_PATH=$PREFIX/lib:$LD_LIBRARY_PATH
+export CPLUS_INCLUDE_PATH=$PREFIX/include:$CPLUS_INCLUDE_PATH
+export LIBRARY_PATH=$PREFIX/lib:$LIBRARY_PATH
+export PATH=$PREFIX/bin:$PATH
+

关于使用这个脚本的细节后文会详细介绍。

从源程序安装SINGA

SINGA 使用 GNU autotools 构建，需要GCC (version >= 4.8)。有两种方式安装SINGA。

+ +

如果你想使用最近的代码，请执行以下命令从 Github 上克隆：

+ +

$ git clone git@github.com:apache/incubator-singa.git
+$ cd incubator-singa
+$ ./autogen.sh
+$ ./configure
+$ make
+

注意: 由于我们的疏忽，在加入Apache Incubator项目后，nusinga 帐号下的SINGA库（repo）并没有删除，但它早已没有更新，很抱歉给大家带来的不便。

+ +

+
如果你下载了发布包，请按以下命令安装：
+ +
+
```
$ tar xvf singa-xxx
+$ cd singa-xxx
+$ ./configure
+$ make
+
```
+
SINGA的部分特性依赖于外部库，这些特性可以使用--enable-<feature>编译。比如，按准跟支持lmdb的SINGA，可以运行下面的命令：
+ +
+
```
$ ./configure --enable-lmdb
+
```

+ +

SINGA编译成功后， libsinga.so 和可执行文件 singa 会生成在 .libs/ 文件夹下。

如果缺失（或没有检测到）某些依赖库，可使用下面的脚本下载和安装：

+ + +

# must goto thirdparty folder
+$ cd thirdparty
+$ ./install.sh LIB_NAME PREFIX
+

如果没有指定安装路径，这些库会被安装在这些软件默认的安装路径下。比如，如果想在默认系统文件夹下安装zeromq，请执行以下命令：

+ +

$ ./install.sh zeromq
+

或者，如果想安装到其他目录：

+ +

$ ./install.sh zeromq PREFIX
+

也可以将所有的依赖库安装到 /usr/local 目录:

+ +

$ ./install.sh all /usr/local
+

下表展示了各依赖库的第一个参数：

+ +

LIB_NAME  LIBRARIE
+czmq*                 czmq lib
+glog                  glog lib
+lmdb                  lmdb lib
+OpenBLAS              OpenBLAS lib
+protobuf              Google protobuf
+zeromq                zeromq lib
+zookeeper             Apache zookeeper
+

*: 因为 czmq 依赖于 zeromq，下述脚本多提供一个参数，说明 zeromq 的位置。 czmq 的安装命令是：

+ + +

$./install.sh czmq  /usr/local -f=/usr/local/zeromq
+

执行后，czmq 会被安装在 /usr/local，上述最后一个路径指明了 zeromq 的路径。

常见问题

+ +

Q1: 即使安装了 OpenBLAS，仍遇见 ./configure --> cannot find blas_segmm() function 错误。

A1: 该错误是指编译器找不着OpenBLAS，如果你安装在 $PREFIX (如, /opt/OpenBLAS)，你需要将路径导出，如下所示

+ +

  $ export LIBRARY_PATH=$PREFIX/lib:$LIBRARY_PATH
+  # e.g.,
+  $ export LIBRARY_PATH=/opt/OpenBLAS/lib:$LIBRARY_PATH
+

+ +

Q2: 碰见错误cblas.h no such file or directory exists。

Q2: 你需要将 cblas.h 所在文件夹包含到 CPLUS_INCLUDE_PATH 中，如：

+ +

  $ export CPLUS_INCLUDE_PATH=$PREFIX/include:$CPLUS_INCLUDE_PATH
+  # e.g.,
+  $ export CPLUS_INCLUDE_PATH=/opt/OpenBLAS/include:$CPLUS_INCLUDE_PATH
+  # then reconfigure and make SINGA
+  $ ./configure
+  $ make
+

+ +

Q3: 编译SINGA时，碰见错误SSE2 instruction set not enabled。

A3: 你可以尝试以下命令:

+ +

  $ make CFLAGS='-msse2' CXXFLAGS='-msse2'
+

+ +

Q4: 当我试着import .py文件时，从google.protobuf.internal 得到错误ImportError: cannot import name enum_type_wrapper。

A4: 通过 make install 安装google protobuf后, 我们应该安装python运行时库。在protobuf源文件夹下运行：

+ +

  $ cd /PROTOBUF/SOURCE/FOLDER
+  $ cd python
+  $ python setup.py build
+  $ python setup.py install
+

如果你要在系统文件夹中安装python的运行时库，可能要用sudo。

+ +

Q5: 遇见由gflags导致的链接错误。

A5: SINGA不依赖gflags，但你可能在安装glog时安装了gflags。这种情况下你需要用 thirdparty/install.sh 重新将glog安装到另一文件夹，并将该文件夹路径导出到LDFLAGS 和 CPPFLAGS 中。

+ +

Q6: 在mac OS X上编译SINGA和安装 glog 时，遇到了致命错误 'ext/slist' file not found

A6: 请单独安装glog，再尝试以下命令:

+ +

  $ make CFLAGS='-stdlib=libstdc++' CXXFLAGS='stdlib=libstdc++'
+

+ +

Q7: 当我启动一个训练作业时，程序报错为 “ZOO_ERROR…zk retcode=-4…”。

A7: 这是因为 zookeeper 没有启动，请启动 zookeeper 服务。

+ +

  $ ./bin/zk-service start
+

如果仍有这个错误，可能是没有java，你可以用下述命令查看

+ +

  $ java --version
+

+ +

Q8: 当我从源文件安装 OpenBLAS 时，被告知需要一个 fortran 编译器。

A8: 按如下命令编译 OpenBLAS：

+ +

  $ make ONLY_CBLAS=1
+

或者用apt-get安装

+ +

    $ sudo apt-get install openblas-dev
+

或者

+ +

    $ sudo yum install openblas-devel
+

后两个命令需要 root 权限，注意OpenBLAS安装后设置环境变量包含头文件和库的路径（参照依赖小节）

+ +

Q9: 当我安装 protocol buffer 时，被告知 GLIBC++_3.4.20 not found in /usr/lib64/libstdc++.so.6.

A9: 这说明链接器找到了 libstdc++.so.6，但是这个文件比用于编译和链接程序的GCC版本老。程序要求属于新版本GCC的libstdc++，所以必须告诉链接器怎么找到新版本的额libstdc++共享库。最简单的解决方法是找到正确的 libstdc++，并把它导出到 LD_LIBRARY_PATH 中。如, 如果GLIBC++_3.4.20 被下面的命令列出，

+ +

  $ strings /usr/local/lib64/libstdc++.so.6|grep GLIBC++
+

你只需这样设置你的环境变量：

+ +

  $ export LD_LIBRARY_PATH=/usr/local/lib64:$LD_LIBRARY_PATH
+

+ +

Q10: 当我在编译glog时，提示如下错误“src/logging_unittest.cc:83:20: error: ‘gflags’ is not a namespace-name”

A10: 可能是你已经安装的gflags版本，其命名空间不是gflags，而是其他的，比如是’google’。因此glog不能找到 ‘gflags’ 命名空间。

编译glog不需要gflags，你可以修改 configure.ac 文件，忽略 gflags。

+ +

cd to glog src directory
修改 configure.ac 第125行，改为 “AC_CHECK_LIB(gflags, main, ac_cv_have_libgflags=0, ac_cv_have_libgflags=0)”
autoreconf

然后，请重新编译glog。

+ +

+ + + + Added: websites/staging/singa/trunk/content/docs/zh/mlp.html ============================================================================== --- websites/staging/singa/trunk/content/docs/zh/mlp.html (added) +++ websites/staging/singa/trunk/content/docs/zh/mlp.html Mon Jan 18 07:44:36 2016 @@ -0,0 +1,528 @@ + + + + + + + + + Apache SINGA – MLP Example + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

+ + + + + + + + + +

+ + +

+ +

+ + +

+ + + +

+ +

+ + +

+ +

MLP Example

Multilayer perceptron (MLP) is a subclass of feed-forward neural networks. A MLP typically consists of multiple directly connected layers, with each layer fully connected to the next one. In this example, we will use SINGA to train a simple MLP model proposed by Ciresan for classifying handwritten digits from the MNIST dataset.

Running instructions

Please refer to the installation page for instructions on building SINGA, and the quick start for instructions on starting zookeeper.

We have provided scripts for preparing the training and test dataset in examples/cifar10/.

+ +

# in examples/mnist
+$ cp Makefile.example Makefile
+$ make download
+$ make create
+

Training on CPU

After the datasets are prepared, we start the training by

+ +

./bin/singa-run.sh -conf examples/mnist/job.conf
+

After it is started, you should see output like

+ +

Record job information to /tmp/singa-log/job-info/job-1-20150817-055231
+Executing : ./singa -conf /xxx/incubator-singa/examples/mnist/job.conf -singa_conf /xxx/incubator-singa/conf/singa.conf -singa_job 1
+E0817 07:15:09.211885 34073 cluster.cc:51] proc #0 -> 192.168.5.128:49152 (pid = 34073)
+E0817 07:15:14.972231 34114 server.cc:36] Server (group = 0, id = 0) start
+E0817 07:15:14.972520 34115 worker.cc:134] Worker (group = 0, id = 0) start
+E0817 07:15:24.462602 34073 trainer.cc:373] Test step-0, loss : 2.341021, accuracy : 0.109100
+E0817 07:15:47.341076 34073 trainer.cc:373] Train step-0, loss : 2.357269, accuracy : 0.099000
+E0817 07:16:07.173364 34073 trainer.cc:373] Train step-10, loss : 2.222740, accuracy : 0.201800
+E0817 07:16:26.714855 34073 trainer.cc:373] Train step-20, loss : 2.091030, accuracy : 0.327200
+E0817 07:16:46.590946 34073 trainer.cc:373] Train step-30, loss : 1.969412, accuracy : 0.442100
+E0817 07:17:06.207080 34073 trainer.cc:373] Train step-40, loss : 1.865466, accuracy : 0.514800
+E0817 07:17:25.890033 34073 trainer.cc:373] Train step-50, loss : 1.773849, accuracy : 0.569100
+E0817 07:17:51.208935 34073 trainer.cc:373] Test step-60, loss : 1.613709, accuracy : 0.662100
+E0817 07:17:53.176766 34073 trainer.cc:373] Train step-60, loss : 1.659150, accuracy : 0.652600
+E0817 07:18:12.783370 34073 trainer.cc:373] Train step-70, loss : 1.574024, accuracy : 0.666000
+E0817 07:18:32.904942 34073 trainer.cc:373] Train step-80, loss : 1.529380, accuracy : 0.670500
+E0817 07:18:52.608111 34073 trainer.cc:373] Train step-90, loss : 1.443911, accuracy : 0.703500
+E0817 07:19:12.168465 34073 trainer.cc:373] Train step-100, loss : 1.387759, accuracy : 0.721000
+E0817 07:19:31.855865 34073 trainer.cc:373] Train step-110, loss : 1.335246, accuracy : 0.736500
+E0817 07:19:57.327133 34073 trainer.cc:373] Test step-120, loss : 1.216652, accuracy : 0.769900
+

After the training of some steps (depends on the setting) or the job is finished, SINGA will checkpoint the model parameters.

Training on GPU

To train this example model on GPU, just add a field in the configuration file for the GPU device,

+ +

# job.conf
+gpu: 0
+

Training using Python script

The python helpers come with SINGA 0.2 make it easy to configure the job. For example the job.conf is replaced with a simple python script mnist_mlp.py which has about 30 lines of code following the Keras API.

+ +

./bin/singa-run.sh -exec tool/python/examples/mnist_mlp.py
+

Details

To train a model in SINGA, you need to prepare the datasets, and a job configuration which specifies the neural net structure, training algorithm (BP or CD), SGD update algorithm (e.g. Adagrad), number of training/test steps, etc.

Data preparation

Before using SINGA, you need to write a program to pre-process the dataset you use to a format that SINGA can read. Please refer to the Data Preparation to get details about preparing this MNIST dataset.

Neural net

+ +

+
Figure 1 - Net structure of the MLP example. +

Figure 1 shows the structure of the simple MLP model, which is constructed following Ciresan’s paper. The dashed circle contains two layers which represent one feature transformation stage. There are 6 such stages in total. They sizes of the InnerProductLayers in these circles decrease from 2500->2000->1500->1000->500->10.

Next we follow the guide in neural net page and layer page to write the neural net configuration.

+ +

We configure an input layer to read the training/testing records from a disk file.

+ +

layer {
+    name: "data"
+    type: kRecordInput
+    store_conf {
+      backend: "kvfile"
+      path: "examples/mnist/train_data.bin"
+      random_skip: 5000
+      batchsize: 64
+      shape: 784
+      std_value: 127.5
+      mean_value: 127.5
+     }
+     exclude: kTest
+  }
+
+layer {
+    name: "data"
+    type: kRecordInput
+    store_conf {
+      backend: "kvfile"
+      path: "examples/mnist/test_data.bin"
+      batchsize: 100
+      shape: 784
+      std_value: 127.5
+      mean_value: 127.5
+     }
+     exclude: kTrain
+  }
+

+ +

All InnerProductLayers are configured similarly as,

+ +

layer{
+  name: "fc1"
+  type: kInnerProduct
+  srclayers:"data"
+  innerproduct_conf{
+    num_output: 2500
+  }
+  param{
+    name: "w1"
+    ...
+  }
+  param{
+    name: "b1"
+    ..
+  }
+}
+

with the num_output decreasing from 2500 to 10.

+
A STanhLayer is connected to every InnerProductLayer except the last one. It transforms the feature via scaled tanh function.
+ +
+
```
layer{
+  name: "tanh1"
+  type: kSTanh
+  srclayers:"fc1"
+}
+
```

The final Softmax loss layer connects to LabelLayer and the last STanhLayer.

+ +

layer{
+  name: "loss"
+  type:kSoftmaxLoss
+  softmaxloss_conf{ topk:1 }
+  srclayers:"fc6"
+  srclayers:"data"
+}
+

Updater

The normal SGD updater is selected. The learning rate shrinks by 0.997 every 60 steps (i.e., one epoch).

+ +

updater{
+  type: kSGD
+  learning_rate{
+    base_lr: 0.001
+    type : kStep
+    step_conf{
+      change_freq: 60
+      gamma: 0.997
+    }
+  }
+}
+

TrainOneBatch algorithm

The MLP model is a feed-forward model, hence Back-propagation algorithm is selected.

+ +

train_one_batch {
+  alg: kBP
+}
+

Cluster setting

The following configuration set a single worker and server for training. Training frameworks page introduces configurations of a couple of distributed training frameworks.

+ +

cluster {
+  nworker_groups: 1
+  nserver_groups: 1
+}
+

+ +

+ + + + Added: websites/staging/singa/trunk/content/docs/zh/neural-net.html ============================================================================== --- websites/staging/singa/trunk/content/docs/zh/neural-net.html (added) +++ websites/staging/singa/trunk/content/docs/zh/neural-net.html Mon Jan 18 07:44:36 2016 @@ -0,0 +1,563 @@ + + + + + + + + + Apache SINGA – Neural Net + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

+ + + + + + + + + +

+ + +

+ +

+ + +

+ + + +

+ +

+ + +

+ +

Neural Net

NeuralNet in SINGA represents an instance of user’s neural net model. As the neural net typically consists of a set of layers, NeuralNet comprises a set of unidirectionally connected Layers. This page describes how to convert an user’s neural net into the configuration of NeuralNet.

Figure 1 - Categorization of popular deep learning models.

Net structure configuration

Users configure the NeuralNet by listing all layers of the neural net and specifying each layer’s source layer names. Popular deep learning models can be categorized as Figure 1. The subsequent sections give details for each category.

Feed-forward models

+ +

+Figure 2 - Net structure of a MLP model. +

Feed-forward models, e.g., CNN and MLP, can easily get configured as their layer connections are undirected without circles. The configuration for the MLP model shown in Figure 1 is as follows,

+ +

net {
+  layer {
+    name : 'data"
+    type : kData
+  }
+  layer {
+    name : 'image"
+    type : kImage
+    srclayer: 'data'
+  }
+  layer {
+    name : 'label"
+    type : kLabel
+    srclayer: 'data'
+  }
+  layer {
+    name : 'hidden"
+    type : kHidden
+    srclayer: 'image'
+  }
+  layer {
+    name : 'softmax"
+    type : kSoftmaxLoss
+    srclayer: 'hidden'
+    srclayer: 'label'
+  }
+}
+

Energy models

Figure 3 - Convert connections in RBM and RNN.

For energy models including RBM, DBM, etc., their connections are undirected (i.e., Category B). To represent these models using NeuralNet, users can simply replace each connection with two directed connections, as shown in Figure 3a. In other words, for each pair of connected layers, their source layer field should include each other’s name. The full RBM example has detailed neural net configuration for a RBM model, which looks like

+ +

net {
+  layer {
+    name : "vis"
+    type : kVisLayer
+    param {
+      name : "w1"
+    }
+    srclayer: "hid"
+  }
+  layer {
+    name : "hid"
+    type : kHidLayer
+    param {
+      name : "w2"
+      share_from: "w1"
+    }
+    srclayer: "vis"
+  }
+}
+

RNN models

For recurrent neural networks (RNN), users can remove the recurrent connections by unrolling the recurrent layer. For example, in Figure 3b, the original layer is unrolled into a new layer with 4 internal layers. In this way, the model is like a normal feed-forward model, thus can be configured similarly. The RNN example has a full neural net configuration for a RNN model.

Configuration for multiple nets

Typically, a training job includes three neural nets for training, validation and test phase respectively. The three neural nets share most layers except the data layer, loss layer or output layer, etc.. To avoid redundant configurations for the shared layers, users can uses the exclude filed to filter a layer in the neural net, e.g., the following layer will be filtered when creating the testing NeuralNet.

+ +

layer {
+  ...
+  exclude : kTest # filter this layer for creating test net
+}
+

Neural net partitioning

A neural net can be partitioned in different ways to distribute the training over multiple workers.

Batch and feature dimension

Figure 4 - Partitioning of a fully connected layer.

Every layer’s feature blob is considered a matrix whose rows are feature vectors. Thus, one layer can be split on two dimensions. Partitioning on dimension 0 (also called batch dimension) slices the feature matrix by rows. For instance, if the mini-batch size is 256 and the layer is partitioned into 2 sub-layers, each sub-layer would have 128 feature vectors in its feature blob. Partitioning on this dimension has no effect on the parameters, as every Param object is replicated in the sub-layers. Partitioning on dimension 1 (also called feature dimension) slices the feature matrix by columns. For example, suppose the original feature vector has 50 units, after partitioning into 2 sub-layers, each sub-layer would have 25 units. This partitioning may result in Param object being split, as shown in Figure 4. Both the bias vector and weight matrix are partitioned into two sub-layers.

Partitioning configuration

There are 4 partitioning schemes, whose configurations are give below,

+ +

+
Partitioning each singe layer into sub-layers on batch dimension (see below). It is enabled by configuring the partition dimension of the layer to 0, e.g.,
+ +
+
```
  # with other fields omitted
+  layer {
+    partition_dim: 0
+  }
+
```
+
Partitioning each singe layer into sub-layers on feature dimension (see below). It is enabled by configuring the partition dimension of the layer to 1, e.g.,
+ +
+
```
  # with other fields omitted
+  layer {
+    partition_dim: 1
+  }
+
```
+
Partitioning all layers into different subsets. It is enabled by configuring the location ID of a layer, e.g.,
+ +
+
```
  # with other fields omitted
+  layer {
+    location: 1
+  }
+  layer {
+    location: 0
+  }
+
```

+ +

Hybrid partitioning of strategy 1, 2 and 3. The hybrid partitioning is useful for large models. An example application is to implement the idea proposed by Alex. Hybrid partitioning is configured like,

+ +

  # with other fields omitted
+  layer {
+    location: 1
+  }
+  layer {
+    location: 0
+  }
+  layer {
+    partition_dim: 0
+    location: 0
+  }
+  layer {
+    partition_dim: 1
+    location: 0
+  }
+

Currently SINGA supports strategy-2 well. Other partitioning strategies are are under test and will be released in later version.

Parameter sharing

Parameters can be shared in two cases,

+ +

+
sharing parameters among layers via user configuration. For example, the visible layer and hidden layer of a RBM shares the weight matrix, which is configured through the share_from field as shown in the above RBM configuration. The configurations must be the same (except name) for shared parameters.
+
due to neural net partitioning, some Param objects are replicated into different workers, e.g., partitioning one layer on batch dimension. These workers share parameter values. SINGA controls this kind of parameter sharing automatically, users do not need to do any configuration.
+
the NeuralNet for training and testing (and validation) share most layers , thus share Param values.

If the shared Param instances resident in the same process (may in different threads), they use the same chunk of memory space for their values. But they would have different memory spaces for their gradients. In fact, their gradients will be averaged by the stub or server.

Advanced user guide

Creation

+ +

static NeuralNet* NeuralNet::Create(const NetProto& np, Phase phase, int num);
+

The above function creates a NeuralNet for a given phase, and returns a pointer to the NeuralNet instance. The phase is in {kTrain, kValidation, kTest}. num is used for net partitioning which indicates the number of partitions. Typically, a training job includes three neural nets for training, validation and test phase respectively. The three neural nets share most layers except the data layer, loss layer or output layer, etc.. The Create function takes in the full net configuration including layers for training, validation and test. It removes layers for phases other than the specified phase based on the exclude field in layer configuration:

+ +

layer {
+  ...
+  exclude : kTest # filter this layer for creating test net
+}
+

The filtered net configuration is passed to the constructor of NeuralNet:

+ +

NeuralNet::NeuralNet(NetProto netproto, int npartitions);
+

The constructor creates a graph representing the net structure firstly in

+ +

Graph* NeuralNet::CreateGraph(const NetProto& netproto, int npartitions);
+

Next, it creates a layer for each node and connects layers if their nodes are connected.

+ +

void NeuralNet::CreateNetFromGraph(Graph* graph, int npartitions);
+

Since the NeuralNet instance may be shared among multiple workers, the Create function returns a pointer to the NeuralNet instance .

Parameter sharing

Param sharing is enabled by first sharing the Param configuration (in NeuralNet::Create) to create two similar (e.g., the same shape) Param objects, and then calling (in NeuralNet::CreateNetFromGraph),

+ +

void Param::ShareFrom(const Param& from);
+

It is also possible to share Params of two nets, e.g., sharing parameters of the training net and the test net,

+ +

void NeuralNet:ShareParamsFrom(NeuralNet* other);
+

It will call Param::ShareFrom for each Param object.

Access functions

NeuralNet provides a couple of access function to get the layers and params of the net:

+ +

const std::vector<Layer*>& layers() const;
+const std::vector<Param*>& params() const ;
+Layer* name2layer(string name) const;
+Param* paramid2param(int id) const;
+

Partitioning

Implementation

SINGA partitions the neural net in CreateGraph function, which creates one node for each (partitioned) layer. For example, if one layer’s partition dimension is 0 or 1, then it creates npartition nodes for it; if the partition dimension is -1, a single node is created, i.e., no partitioning. Each node is assigned a partition (or location) ID. If the original layer is configured with a location ID, then the ID is assigned to each newly created node. These nodes are connected according to the connections of the original layers. Some connection layers will be added automatically. For instance, if two connected sub-layers are located at two different workers, then a pair of bridge layers is inserted to transfer the feature (and gradient) blob between them. When two layers are partitioned on different dimensions, a concatenation layer which concatenates feature rows (or columns) and a slice layer which slices feature rows (or columns) would be inserted. These connection layers help making the network communication and synchronization transparent to the users.

Dispatching partitions to workers

Each (partitioned) layer is assigned a location ID, based on which it is dispatched to one worker. Particularly, the pointer to the NeuralNet instance is passed to every worker within the same group, but each worker only computes over the layers that have the same partition (or location) ID as the worker’s ID. When every worker computes the gradients of the entire model parameters (strategy-2), we refer to this process as data parallelism. When different workers compute the gradients of different parameters (strategy-3 or strategy-1), we call this process model parallelism. The hybrid partitioning leads to hybrid parallelism where some workers compute the gradients of the same subset of model parameters while other workers compute on different model parameters. For example, to implement the hybrid parallelism in for the DCNN model, we set partition_dim = 0 for lower layers and partition _dim = 1 for higher layers.

+ +

+ + + +