; Wed, 20 Apr 2016 05:12:11 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: svn commit: r986140 [21/34] - in /websites/staging/singa/trunk/content: ./ community/ develop/ docs/ docs/jp/ docs/kr/ docs/zh/ releases/ v0.1.0/ v0.2.0/ v0.2.0/jp/ v0.2.0/kr/ v0.2.0/zh/ v0.3.0/ v0.3.0/jp/ v0.3.0/kr/ v0.3.0/zh/ Date: Wed, 20 Apr 2016 05:12:08 -0000 To: commits@singa.incubator.apache.org From: buildbot@apache.org X-Mailer: svnmailer-1.0.9 Message-Id: <20160420051211.9CE5E3A022C@svn01-us-west.apache.org> Added: websites/staging/singa/trunk/content/v0.3.0/kr/layer.html ============================================================================== --- websites/staging/singa/trunk/content/v0.3.0/kr/layer.html (added) +++ websites/staging/singa/trunk/content/v0.3.0/kr/layer.html Wed Apr 20 05:12:03 2016 @@ -0,0 +1,823 @@ + + + + + + + + + Apache SINGA – Layers + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

+ + + + + + + + + +

+ + +

+ +

+ + +

+ + + +

+ +

+ + +

+ +

Layers

Layer is a core abstraction in SINGA. It performs a variety of feature transformations for extracting high-level features, e.g., loading raw features, parsing RGB values, doing convolution transformation, etc.

The Basic user guide section introduces the configuration of a built-in layer. Advanced user guide explains how to extend the base Layer class to implement users’ functions.

Basic user guide

Layer configuration

Configuration of two example layers are shown below,

+ +

layer {
+  name: "data"
+  type: kCSVRecord
+  store_conf { }
+}
+layer{
+  name: "fc1"
+  type: kInnerProduct
+  srclayers: "data"
+  innerproduct_conf{ }
+  param{ }
+}
+

There are some common fields for all kinds of layers:

+ +

name: a string used to differentiate two layers in a neural net.
type: an integer used for identifying a specific Layer subclass. The types of built-in layers are listed in LayerType (defined in job.proto). For user-defined layer subclasses, user_type should be used instead of type.
srclayers: names of the source layers. In SINGA, all connections are converted to directed connections.
param: configuration for a Param instance. There can be multiple Param objects in one layer.

Different layers may have different configurations. These configurations are defined in <type>_conf. E.g., “fc1” layer has innerproduct_conf. The subsequent sections explain the functionality of each built-in layer and how to configure it.

Built-in Layer subclasses

SINGA has provided many built-in layers, which can be used directly to create neural nets. These layers are categorized according to their functionalities,

+ +

Input layers for loading records (e.g., images) from disk files, HDFS or network into memory.
Neuron layers for feature transformation, e.g., convolution, pooling, dropout, etc.
Loss layers for measuring the training objective loss, e.g., Cross Entropy loss or Euclidean loss.
Output layers for outputting the prediction results (e.g., probabilities of each category) or features into persistent storage, e.g., disk or HDFS.
Connection layers for connecting layers when the neural net is partitioned.

Input layers

Input layers load training/test data from disk or other places (e.g., HDFS or network) into memory.

StoreInputLayer

StoreInputLayer is a base layer for loading data from data store. The data store can be a KVFile or TextFile (LMDB, LevelDB, HDFS, etc., will be supported later). Its ComputeFeature function reads batchsize (string:key, string:value) tuples. Each tuple is parsed by a Parse function implemented by its subclasses.

The configuration for this layer is in store_conf,

+ +

store_conf {
+  backend: # "kvfile" or "textfile"
+  path: # path to the data store
+  batchsize :
+  ...
+}
+

SingleLabelRecordLayer

It is a subclass of StoreInputLayer. It assumes the (key, value) tuple loaded from a data store contains a feature vector (and a label) for one data instance. All feature vectors are of the same fixed length. The shape of one instance is configured through the shape field, e.g., the following configuration specifies the shape for the CIFAR10 images.

+ +

store_conf {
+  shape: 3  #channels
+  shape: 32 #height
+  shape: 32 #width
+}
+

It may do some preprocessing like standardization. The data for preprocessing is loaded by and parsed in a virtual function, which is implemented by its subclasses.

RecordInputLayer

It is a subclass of SingleLabelRecordLayer. It parses the value field from one tuple into a RecordProto, which is generated by Google Protobuf according to common.proto. It can be used to store features for images (e.g., using the pixel field) or other objects (using the data field). The key field is not parsed.

+ +

type: kRecordInput
+store_conf {
+  has_label: # default is true
+  ...
+}
+

CSVInputLayer

It is a subclass of SingleLabelRecordLayer. The value field from one tuple is parsed as a CSV line (separated by comma). The first number would be parsed as a label if has_label is configured in store_conf. Otherwise, all numbers would be parsed into one row of the data_ Blob.

+ +

type: kCSVInput
+store_conf {
+  has_label: # default is true
+  ...
+}
+

ImagePreprocessLayer

This layer does image preprocessing, e.g., cropping, mirroring and scaling, against the data Blob from its source layer. It deprecates the RGBImageLayer which works on the Record from ShardDataLayer. It still uses the same configuration as RGBImageLayer,

+ +

type: kImagePreprocess
+rgbimage_conf {
+  scale: float
+  cropsize: int  # cropping each image to keep the central part with this size
+  mirror: bool  # mirror the image by set image[i,j]=image[i,len-j]
+  meanfile: "Image_Mean_File_Path"
+}
+

ShardDataLayer (Deprected)

Deprected! Please use ProtoRecordInputLayer or CSVRecordInputLayer.

ShardDataLayer is a subclass of DataLayer, which reads Records from disk file. The file should be created using DataShard class. With the data file prepared, users configure the layer as

+ +

type: kShardData
+sharddata_conf {
+  path: "path to data shard folder"
+  batchsize: int
+  random_skip: int
+}
+

batchsize specifies the number of records to be trained for one mini-batch. The first rand() % random_skip Records will be skipped at the first iteration. This is to enforce that different workers work on different Records.

LMDBDataLayer (Deprected)

Deprected! Please use ProtoRecordInputLayer or CSVRecordInputLayer.

[LMDBDataLayer] is similar to ShardDataLayer, except that the Records are loaded from LMDB.

+ +

type: kLMDBData
+lmdbdata_conf {
+  path: "path to LMDB folder"
+  batchsize: int
+  random_skip: int
+}
+

ParserLayer (Deprected)

Deprected! Please use ProtoRecordInputLayer or CSVRecordInputLayer.

It get a vector of Records from DataLayer and parse features into a Blob.

+ +

virtual void ParseRecords(Phase phase, const vector<Record>& records, Blob<float>* blob) = 0;
+

LabelLayer (Deprected)

Deprected! Please use ProtoRecordInputLayer or CSVRecordInputLayer.

LabelLayer is a subclass of ParserLayer. It parses a single label from each Record. Consequently, it will put $b$ (mini-batch size) values into the Blob. It has no specific configuration fields.

MnistImageLayer (Deprected)

Deprected! Please use ProtoRecordInputLayer or CSVRecordInputLayer. [MnistImageLayer] is a subclass of ParserLayer. It parses the pixel values of each image from the MNIST dataset. The pixel values may be normalized as x/norm_a - norm_b. For example, if norm_a is set to 255 and norm_b is set to 0, then every pixel will be normalized into [0, 1].

+ +

type: kMnistImage
+mnistimage_conf {
+  norm_a: float
+  norm_b: float
+}
+

RGBImageLayer (Deprected)

Deprected! Please use the ImagePreprocessLayer. RGBImageLayer is a subclass of ParserLayer. It parses the RGB values of one image from each Record. It may also apply some transformations, e.g., cropping, mirroring operations. If the meanfile is specified, it should point to a path that contains one Record for the mean of each pixel over all training images.

+ +

type: kRGBImage
+rgbimage_conf {
+  scale: float
+  cropsize: int  # cropping each image to keep the central part with this size
+  mirror: bool  # mirror the image by set image[i,j]=image[i,len-j]
+  meanfile: "Image_Mean_File_Path"
+}
+

PrefetchLayer

PrefetchLayer embeds other input layers to do data prefeching. It will launch a thread to call the embedded layers to load and extract features. It ensures that the I/O task and computation task can work simultaneously. One example PrefetchLayer configuration is,

+ +

layer {
+  name: "prefetch"
+  type: kPrefetch
+  sublayers {
+    name: "data"
+    type: kShardData
+    sharddata_conf { }
+  }
+  sublayers {
+    name: "rgb"
+    type: kRGBImage
+    srclayers:"data"
+    rgbimage_conf { }
+  }
+  sublayers {
+    name: "label"
+    type: kLabel
+    srclayers: "data"
+  }
+  exclude:kTest
+}
+

The layers on top of the PrefetchLayer should use the name of the embedded layers as their source layers. For example, the “rgb” and “label” should be configured to the srclayers of other layers.

Output Layers

Output layers get data from their source layers and write them to persistent storage, e.g., disk files or HDFS (to be supported).

RecordOutputLayer

This layer gets data (and label if it is available) from its source layer and converts it into records of type RecordProto. Records are written as (key = instance No., value = serialized record) tuples into Store, e.g., KVFile. The configuration of this layer should include the specifics of the Store backend via store_conf.

+ +

layer {
+  name: "output"
+  type: kRecordOutput
+  srclayers:
+  store_conf {
+    backend: "kvfile"
+    path:
+  }
+}
+

CSVOutputLayer

This layer gets data (and label if it available) from its source layer and converts it into a string per instance with fields separated by commas (i.e., CSV format). The shape information is not kept in the string. All strings are written into Store, e.g., text file. The configuration of this layer should include the specifics of the Store backend via store_conf.

+ +

layer {
+  name: "output"
+  type: kCSVOutput
+  srclayers:
+  store_conf {
+    backend: "textfile"
+    path:
+  }
+}
+

Neuron Layers

Neuron layers conduct feature transformations.

ConvolutionLayer

ConvolutionLayer conducts convolution transformation.

+ +

type: kConvolution
+convolution_conf {
+  num_filters: int
+  kernel: int
+  stride: int
+  pad: int
+}
+param { } # weight/filter matrix
+param { } # bias vector
+

The int value num_filters stands for the count of the applied filters; the int value kernel stands for the convolution kernel size (equal width and height); the int value stride stands for the distance between the successive filters; the int value pad pads each with a given int number of pixels border of zeros.

InnerProductLayer

InnerProductLayer is fully connected with its (single) source layer. Typically, it has two parameter fields, one for weight matrix, and the other for bias vector. It rotates the feature of the source layer (by multiplying with weight matrix) and shifts it (by adding the bias vector).

+ +

type: kInnerProduct
+innerproduct_conf {
+  num_output: int
+}
+param { } # weight matrix
+param { } # bias vector
+

PoolingLayer

PoolingLayer is used to do a normalization (or averaging or sampling) of the feature vectors from the source layer.

+ +

type: kPooling
+pooling_conf {
+  pool: AVE|MAX // Choose whether use the Average Pooling or Max Pooling
+  kernel: int   // size of the kernel filter
+  pad: int      // the padding size
+  stride: int   // the step length of the filter
+}
+

The pooling layer has two methods: Average Pooling and Max Pooling. Use the enum AVE and MAX to choose the method.

+ +

Max Pooling selects the max value for each filtering area as a point of the result feature blob.
Average Pooling averages all values for each filtering area at a point of the result feature blob.

ReLULayer

ReLuLayer has rectified linear neurons, which conducts the following transformation, f(x) = Max(0, x). It has no specific configuration fields.

STanhLayer

STanhLayer uses the scaled tanh as activation function, i.e., f(x)=1.7159047* tanh(0.6666667 * x). It has no specific configuration fields.

SigmoidLayer

[SigmoidLayer] uses the sigmoid (or logistic) as activation function, i.e., f(x)=sigmoid(x). It has no specific configuration fields.

Dropout Layer

DropoutLayer is a layer that randomly dropouts some inputs. This scheme helps deep learning model away from over-fitting.

+ +

type: kDropout
+dropout_conf {
+  dropout_ratio: float # dropout probability
+}
+

LRNLayer

LRNLayer, (Local Response Normalization), normalizes over the channels.

+ +

type: kLRN
+lrn_conf {
+  local_size: int
+  alpha: float  // scaling parameter
+  beta: float   // exponential number
+}
+

local_size specifies the quantity of the adjoining channels which will be summed up. For WITHIN_CHANNEL, it means the side length of the space region which will be summed up.

Loss Layers

Loss layers measures the objective training loss.

SoftmaxLossLayer

SoftmaxLossLayer is a combination of the Softmax transformation and Cross-Entropy loss. It applies Softmax firstly to get a prediction probability for each output unit (neuron) and compute the cross-entropy against the ground truth. It is generally used as the final layer to generate labels for classification tasks.

+ +

type: kSoftmaxLoss
+softmaxloss_conf {
+  topk: int
+}
+

The configuration field topk is for selecting the labels with topk probabilities as the prediction results. It is tedious for users to view the prediction probability of every label.

ConnectionLayer

Subclasses of ConnectionLayer are utility layers that connects other layers due to neural net partitioning or other cases.

ConcateLayer

ConcateLayer connects more than one source layers to concatenate their feature blob along given dimension.

+ +

type: kConcate
+concate_conf {
+  concate_dim: int  // define the dimension
+}
+

SliceLayer

SliceLayer connects to more than one destination layers to slice its feature blob along given dimension.

+ +

type: kSlice
+slice_conf {
+  slice_dim: int
+}
+

SplitLayer

SplitLayer connects to more than one destination layers to replicate its feature blob.

+ +

type: kSplit
+split_conf {
+  num_splits: int
+}
+

BridgeSrcLayer & BridgeDstLayer

BridgeSrcLayer & BridgeDstLayer are utility layers assisting data (e.g., feature or gradient) transferring due to neural net partitioning. These two layers are added implicitly. Users typically do not need to configure them in their neural net configuration.

OutputLayer

It write the prediction results or the extracted features into file, HTTP stream or other places. Currently SINGA has not implemented any specific output layer.

Advanced user guide

The base Layer class is introduced in this section, followed by how to implement a new Layer subclass.

Base Layer class

Members

+ +

LayerProto layer_conf_;
+Blob<float> data_, grad_;
+vector<AuxType> aux_data_;
+

The base layer class keeps the user configuration in layer_conf_. Almost all layers has $b$ (mini-batch size) feature vectors, which are stored in the data_ Blob (A Blob is a chunk of memory space, proposed in Caffe). There are layers without feature vectors; instead, they share the data from source layers. The grad_ Blob is for storing the gradients of the objective loss w.r.t. the data_ Blob. It is necessary in BP algorithm, hence we put it as a member of the base class. For CD algorithm, the grad_ field is not used; instead, the layers for the RBM model may have a Blob for the positive phase feature and a Blob for the negative phase feature. For a recurrent layer in RNN, one row of the feature blob corresponds to the feature of one internal layer. The aux_data_ stores the auxiliary data, e.g., image label (set AuxType to int). If images have variant number of labels, the AuxType can be defined to vector<int>. Currently, we hard code AuxType to int. It will be added as a template argument of Layer class later.

If a layer has parameters, these parameters are declared using type Param. Since some layers do not have parameters, we do not declare any Param in the base layer class.

Functions

+ +

virtual void Setup(const LayerProto& conf, const vector<Layer*>& srclayers);
+virtual void ComputeFeature(int flag, const vector<Layer*>& srclayers) = 0;
+virtual void ComputeGradient(int flag, const vector<Layer*>& srclayers) = 0;
+

The Setup function reads user configuration, i.e. conf, and information from source layers, e.g., mini-batch size, to set the shape of the data_ (and grad_) field as well as some other layer specific fields. Memory will not be allocated until computation over the data structure happens.

The ComputeFeature function evaluates the feature blob by transforming (e.g. convolution and pooling) features from the source layers. ComputeGradient computes the gradients of parameters associated with this layer. These two functions are invoked by the TrainOneBatch function during training. Hence, they should be consistent with the TrainOneBatch function. Particularly, for feed-forward and RNN models, they are trained using BP algorithm, which requires each layer’s ComputeFeature function to compute data_ based on source layers, and requires each layer’s ComputeGradient to compute gradients of parameters and source layers’ grad_. For energy models, e.g., RBM, they are trained by CD algorithm, which requires each layer’s ComputeFeature f unction to compute the feature vectors for the positive phase or negative phase depending on the phase argument, and requires the ComputeGradient function to only compute parameter gradients. For some layers, e.g., loss layer or output layer, they can put the loss or prediction result into the metric argument, which will be averaged and displayed periodically.

Implementing a new Layer subclass

Users can extend the Layer class or other subclasses to implement their own feature transformation logics as long as the two virtual functions are overridden to be consistent with the TrainOneBatch function. The Setup function may also be overridden to read specific layer configuration.

The RNNLM provides a couple of user-defined layers. You can refer to them as examples.

Layer specific protocol message

To implement a new layer, the first step is to define the layer specific configuration. Suppose the new layer is FooLayer, the layer specific google protocol message FooLayerProto should be defined as

+ +

# in user.proto
+package singa
+import "job.proto"
+message FooLayerProto {
+  optional int32 a = 1;  // specific fields to the FooLayer
+}
+

In addition, users need to extend the original LayerProto (defined in job.proto of SINGA) to include the foo_conf as follows.

+ +

extend LayerProto {
+  optional FooLayerProto foo_conf = 101;  // unique field id, reserved for extensions
+}
+

If there are multiple new layers, then each layer that has specific configurations would have a <type>_conf field and takes one unique extension number. SINGA has reserved enough extension numbers, e.g., starting from 101 to 1000.

+ +

# job.proto of SINGA
+LayerProto {
+  ...
+  extensions 101 to 1000;
+}
+

With user.proto defined, users can use protoc to generate the user.pb.cc and user.pb.h files. In users’ code, the extension fields can be accessed via,

+ +

auto conf = layer_proto_.GetExtension(foo_conf);
+int a = conf.a();
+

When defining configurations of the new layer (in job.conf), users should use user_type for its layer type instead of type. In addition, foo_conf should be enclosed in brackets.

+ +

layer {
+  name: "foo"
+  user_type: "kFooLayer"  # Note user_type of user-defined layers is string
+  [foo_conf] {      # Note there is a pair of [] for extension fields
+    a: 10
+  }
+}
+

New Layer subclass declaration

The new layer subclass can be implemented like the built-in layer subclasses.

+ +

class FooLayer : public singa::Layer {
+ public:
+  void Setup(const LayerProto& conf, const vector<Layer*>& srclayers) override;
+  void ComputeFeature(int flag, const vector<Layer*>& srclayers) override;
+  void ComputeGradient(int flag, const vector<Layer*>& srclayers) override;
+
+ private:
+  //  members
+};
+

Users must override the two virtual functions to be called by the TrainOneBatch for either BP or CD algorithm. Typically, the Setup function will also be overridden to initialize some members. The user configured fields can be accessed through layer_conf_ as shown in the above paragraphs.

New Layer subclass registration

The newly defined layer should be registered in main.cc by adding

+ +

driver.RegisterLayer<FooLayer, std::string>("kFooLayer"); // "kFooLayer" should be matched to layer configurations in job.conf.
+

After that, the NeuralNet can create instances of the new Layer subclass.

+ +

+ + + + Added: websites/staging/singa/trunk/content/v0.3.0/kr/mesos.html ============================================================================== --- websites/staging/singa/trunk/content/v0.3.0/kr/mesos.html (added) +++ websites/staging/singa/trunk/content/v0.3.0/kr/mesos.html Wed Apr 20 05:12:03 2016 @@ -0,0 +1,451 @@ + + + + + + + + + Apache SINGA – Distributed Training on Mesos + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

+ + + + + + + + + +

+ + +

+ +

+ + +

+ + + +

+ +

+ + +

+ +

Distributed Training on Mesos

This guide explains how to start SINGA distributed training on a Mesos cluster. It assumes that both Mesos and HDFS are already running, and every node has SINGA installed. We assume the architecture depicted below, in which a cluster nodes are Docker container. Refer to Docker guide for details of how to start individual nodes and set up network connection between them (make sure weave is running at each node, and the cluster’s headnode is running in container node0)

Nothing

Start HDFS and Mesos

Go inside each container, using: +docker exec -it nodeX /bin/bash + and configure it as follows:

+ +

On container node0

+ +

hadoop namenode -format
+hadoop-daemon.sh start namenode
+/opt/mesos-0.22.0/build/bin/mesos-master.sh --work_dir=/opt --log_dir=/opt --quiet > /dev/null &
+zk-service.sh start
+

On container node1, node2, ...

+ +

hadoop-daemon.sh start datanode
+/opt/mesos-0.22.0/build/bin/mesos-slave.sh --master=node0:5050 --log_dir=/opt --quiet > /dev/null &
+

To check if the setup has been successful, check that HDFS namenode has registered N datanodes, via:

+ +

hadoop dfsadmin -report
+

Mesos logs

Mesos logs are stored at /opt/lt-mesos-master.INFO on node0 and /opt/lt-mesos-slave.INFO at other nodes.

Starting SINGA training on Mesos

Assumed that Mesos and HDFS are already started, SINGA job can be launched at any container.

Launching job

+ +

Log in to any container, then cd incubator-singa/tool/mesos
Check that configuration files are correct: + +
- scheduler.conf contains information about the master nodes
- singa.conf contains information about Zookeeper node0
- Job configuration file job.conf contains full path to the examples directories (NO RELATIVE PATH!).

Start the job:

+ +

If starting for the first time:

+ +

      ./scheduler <job config file> -scheduler_conf <scheduler config file> -singa_conf <SINGA config file>
+

+ +

If not the first time:

+ +

      ./scheduler <job config file>
+

Notes. Each running job is given a frameworkID. Look for the log message of the form:

+ +

         Framework registered with XXX-XXX-XXX-XXX-XXX-XXX
+

Monitoring and Debugging

Each Mesos job is given a frameworkID and a sandbox directory is created for each job. The directory is in the specified work_dir (or /tmp/mesos) by default. For example, the error during SINGA execution can be found at:

+ +

        /tmp/mesos/slaves/xxxxx-Sx/frameworks/xxxxx/executors/SINGA_x/runs/latest/stderr
+

Other artifacts, like files downloaded from HDFS (job.conf) and stdout can be found in the same directory.

Stopping

There are two way to kill the running job:

+ +

+
If the scheduler is running in the foreground, simply kill it (using Ctrl-C, for example).

If the scheduler is running in the background, kill it using Mesos’s REST API:

+ +

  curl -d "frameworkId=XXX-XXX-XXX-XXX-XXX-XXX" -X POST http://<master>/master/shutdown
+

+ +

+ + + + Added: websites/staging/singa/trunk/content/v0.3.0/kr/mlp.html ============================================================================== --- websites/staging/singa/trunk/content/v0.3.0/kr/mlp.html (added) +++ websites/staging/singa/trunk/content/v0.3.0/kr/mlp.html Wed Apr 20 05:12:03 2016 @@ -0,0 +1,521 @@ + + + + + + + + + Apache SINGA – MLP Example + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

+ + + + + + + + + +

+ + +

+ +

+ + +

+ + + +

+ +

+ + +

+ +

MLP Example

Multilayer perceptron (MLP) is a subclass of feed-forward neural networks. A MLP typically consists of multiple directly connected layers, with each layer fully connected to the next one. In this example, we will use SINGA to train a simple MLP model proposed by Ciresan for classifying handwritten digits from the MNIST dataset.

Running instructions

Please refer to the installation page for instructions on building SINGA, and the quick start for instructions on starting zookeeper.

We have provided scripts for preparing the training and test dataset in examples/cifar10/.

+ +

# in examples/mnist
+$ cp Makefile.example Makefile
+$ make download
+$ make create
+

After the datasets are prepared, we start the training by

+ +

./bin/singa-run.sh -conf examples/mnist/job.conf
+

After it is started, you should see output like

+ +

Record job information to /tmp/singa-log/job-info/job-1-20150817-055231
+Executing : ./singa -conf /xxx/incubator-singa/examples/mnist/job.conf -singa_conf /xxx/incubator-singa/conf/singa.conf -singa_job 1
+E0817 07:15:09.211885 34073 cluster.cc:51] proc #0 -> 192.168.5.128:49152 (pid = 34073)
+E0817 07:15:14.972231 34114 server.cc:36] Server (group = 0, id = 0) start
+E0817 07:15:14.972520 34115 worker.cc:134] Worker (group = 0, id = 0) start
+E0817 07:15:24.462602 34073 trainer.cc:373] Test step-0, loss : 2.341021, accuracy : 0.109100
+E0817 07:15:47.341076 34073 trainer.cc:373] Train step-0, loss : 2.357269, accuracy : 0.099000
+E0817 07:16:07.173364 34073 trainer.cc:373] Train step-10, loss : 2.222740, accuracy : 0.201800
+E0817 07:16:26.714855 34073 trainer.cc:373] Train step-20, loss : 2.091030, accuracy : 0.327200
+E0817 07:16:46.590946 34073 trainer.cc:373] Train step-30, loss : 1.969412, accuracy : 0.442100
+E0817 07:17:06.207080 34073 trainer.cc:373] Train step-40, loss : 1.865466, accuracy : 0.514800
+E0817 07:17:25.890033 34073 trainer.cc:373] Train step-50, loss : 1.773849, accuracy : 0.569100
+E0817 07:17:51.208935 34073 trainer.cc:373] Test step-60, loss : 1.613709, accuracy : 0.662100
+E0817 07:17:53.176766 34073 trainer.cc:373] Train step-60, loss : 1.659150, accuracy : 0.652600
+E0817 07:18:12.783370 34073 trainer.cc:373] Train step-70, loss : 1.574024, accuracy : 0.666000
+E0817 07:18:32.904942 34073 trainer.cc:373] Train step-80, loss : 1.529380, accuracy : 0.670500
+E0817 07:18:52.608111 34073 trainer.cc:373] Train step-90, loss : 1.443911, accuracy : 0.703500
+E0817 07:19:12.168465 34073 trainer.cc:373] Train step-100, loss : 1.387759, accuracy : 0.721000
+E0817 07:19:31.855865 34073 trainer.cc:373] Train step-110, loss : 1.335246, accuracy : 0.736500
+E0817 07:19:57.327133 34073 trainer.cc:373] Test step-120, loss : 1.216652, accuracy : 0.769900
+

After the training of some steps (depends on the setting) or the job is finished, SINGA will checkpoint the model parameters.

Details

To train a model in SINGA, you need to prepare the datasets, and a job configuration which specifies the neural net structure, training algorithm (BP or CD), SGD update algorithm (e.g. Adagrad), number of training/test steps, etc.

Data preparation

Before using SINGA, you need to write a program to pre-process the dataset you use to a format that SINGA can read. Please refer to the Data Preparation to get details about preparing this MNIST dataset.

Neural net

+ +

+
Figure 1 - Net structure of the MLP example. +

Figure 1 shows the structure of the simple MLP model, which is constructed following Ciresan’s paper. The dashed circle contains two layers which represent one feature transformation stage. There are 6 such stages in total. They sizes of the InnerProductLayers in these circles decrease from 2500->2000->1500->1000->500->10.

Next we follow the guide in neural net page and layer page to write the neural net configuration.

+ +

We configure an input layer to read the training/testing records from a disk file.

+ +

layer {
+    name: "data"
+    type: kRecordInput
+    store_conf {
+      backend: "kvfile"
+      path: "examples/mnist/train_data.bin"
+      random_skip: 5000
+      batchsize: 64
+      shape: 784
+      std_value: 127.5
+      mean_value: 127.5
+     }
+     exclude: kTest
+  }
+
+layer {
+    name: "data"
+    type: kRecordInput
+    store_conf {
+      backend: "kvfile"
+      path: "examples/mnist/test_data.bin"
+      batchsize: 100
+      shape: 784
+      std_value: 127.5
+      mean_value: 127.5
+     }
+     exclude: kTrain
+  }
+

+ +

All InnerProductLayers are configured similarly as,

+ +

layer{
+  name: "fc1"
+  type: kInnerProduct
+  srclayers:"data"
+  innerproduct_conf{
+    num_output: 2500
+  }
+  param{
+    name: "w1"
+    ...
+  }
+  param{
+    name: "b1"
+    ..
+  }
+}
+

with the num_output decreasing from 2500 to 10.

+
A STanhLayer is connected to every InnerProductLayer except the last one. It transforms the feature via scaled tanh function.
+ +
+
```
layer{
+  name: "tanh1"
+  type: kSTanh
+  srclayers:"fc1"
+}
+
```

The final Softmax loss layer connects to LabelLayer and the last STanhLayer.

+ +

layer{
+  name: "loss"
+  type:kSoftmaxLoss
+  softmaxloss_conf{ topk:1 }
+  srclayers:"fc6"
+  srclayers:"data"
+}
+

Updater

The normal SGD updater is selected. The learning rate shrinks by 0.997 every 60 steps (i.e., one epoch).

+ +

updater{
+  type: kSGD
+  learning_rate{
+    base_lr: 0.001
+    type : kStep
+    step_conf{
+      change_freq: 60
+      gamma: 0.997
+    }
+  }
+}
+

TrainOneBatch algorithm

The MLP model is a feed-forward model, hence Back-propagation algorithm is selected.

+ +

train_one_batch {
+  alg: kBP
+}
+

Cluster setting

The following configuration set a single worker and server for training. Training frameworks page introduces configurations of a couple of distributed training frameworks.

+ +

cluster {
+  nworker_groups: 1
+  nserver_groups: 1
+}
+

+ +

+ + + +