; Tue, 12 Apr 2016 06:24:57 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: svn commit: r985457 [26/35] - in /websites/staging/singa/trunk/content: ./ community/ develop/ docs/ docs/jp/ docs/kr/ docs/zh/ releases/ v0.1.0/ v0.2.0/ v0.2.0/jp/ v0.2.0/kr/ v0.2.0/zh/ Date: Tue, 12 Apr 2016 06:24:54 -0000 To: commits@singa.incubator.apache.org From: buildbot@apache.org X-Mailer: svnmailer-1.0.9 Message-Id: <20160412062457.5970B3A0591@svn01-us-west.apache.org> Added: websites/staging/singa/trunk/content/v0.2.0/kr/test.html ============================================================================== --- websites/staging/singa/trunk/content/v0.2.0/kr/test.html (added) +++ websites/staging/singa/trunk/content/v0.2.0/kr/test.html Tue Apr 12 06:24:50 2016 @@ -0,0 +1,436 @@ + + + + + + + + + Apache SINGA – Performance Test and Feature Extraction + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

+ + + + + + + + + +

+ + +

+ +

+ + +

+ + + +

+ +

+ + +

+ +

Performance Test and Feature Extraction

Once SINGA finishes the training of a model, it would checkpoint the model parameters into disk files under the checkpoint folder. Model parameters can also be dumped into this folder periodically during training if the [checkpoint configuration[(checkpoint.html) fields are set. With the checkpoint files, we can load the model parameters to conduct performance test, feature extraction and prediction against new data.

To load the model parameters from checkpoint files, we need to add the paths of checkpoint files in the job configuration file

+ +

checkpoint_path: PATH_TO_CHECKPOINT_FILE1
+checkpoint_path: PATH_TO_CHECKPOINT_FILE2
+...
+

The new dataset is configured by specifying the test_step and the data input layer, e.g. the following configuration is for a dataset with 100*100 instances.

+ +

test_steps: 100
+net {
+  layer {
+    name: "input"
+    store_conf {
+      backend: "kvfile"
+      path: PATH_TO_TEST_KVFILE
+      batchsize: 100
+    }
+  }
+  ...
+}
+

Performance Test

This application is to test the performance, e.g., accuracy, of the previously trained model. Depending on the application, the test data may have ground truth labels or not. For example, if the model is trained for image classification, the test images must have ground truth labels to calculate the accuracy; if the model is an auto-encoder, the performance could be measured by reconstruction error, which does not require extra labels. For both cases, there would be a layer that calculates the performance, e.g., the SoftmaxLossLayer.

The job configuration file for the cifar10 example can be used directly for testing after adding the checkpoint path. The running command is

+ +

$ ./bin/singa-run.sh -conf examples/cifar10/job.conf -test
+

The performance would be output on the screen like,

+ +

Load from checkpoint file examples/cifar10/checkpoint/step50000-worker0
+accuracy = 0.728000, loss = 0.807645
+

Feature extraction

Since deep learning models are good at learning features, feature extraction for is a major functionality of deep learning models, e.g., we can extract features from the fully connected layers of AlexNet as image features for image retrieval. To extract the features from one layer, we simply add an output layer after that layer. For instance, to extract the fully connected (with name ip1) layer of the cifar10 example model, we replace the SoftmaxLossLayer with a CSVOutputLayer which extracts the features into a CSV file,

+ +

layer {
+  name: "ip1"
+}
+layer {
+  name: "output"
+  type: kCSVOutput
+  srclayers: "ip1"
+  store_conf {
+    backend: "textfile"
+    path: OUTPUT_FILE_PATH
+  }
+}
+

The input layer and test steps, and the running command are the same as in Performance Test section.

Label Prediction

If the output layer is connected to a layer that predicts labels of images, the output layer would then write the prediction results into files. SINGA provides two built-in layers for generating prediction results, namely,

+ +

SoftmaxLayer, generates probabilities of each candidate labels.
ArgSortLayer, sorts labels according to probabilities in descending order and keep topk labels.

By connecting the two layers with the previous layer and the output layer, we can extract the predictions of each instance. For example,

+ +

layer {
+  name: "feature"
+  ...
+}
+layer {
+  name: "softmax"
+  type: kSoftmax
+  srclayers: "feature"
+}
+layer {
+  name: "prediction"
+  type: kArgSort
+  srclayers: "softmax"
+  argsort_conf {
+    topk: 5
+  }
+}
+layer {
+  name: "output"
+  type: kCSVOutput
+  srclayers: "prediction"
+  store_conf {}
+}
+

The top-5 labels of each instance will be written as one line of the output CSV file. Currently, above layers cannot co-exist with the loss layers used for training. Please comment out the loss layers for extracting prediction results.

+ +

+ + + + Added: websites/staging/singa/trunk/content/v0.2.0/kr/train-one-batch.html ============================================================================== --- websites/staging/singa/trunk/content/v0.2.0/kr/train-one-batch.html (added) +++ websites/staging/singa/trunk/content/v0.2.0/kr/train-one-batch.html Tue Apr 12 06:24:50 2016 @@ -0,0 +1,478 @@ + + + + + + + + + Apache SINGA – Train-One-Batch + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

+ + + + + + + + + +

+ + +

+ +

+ + +

+ + + +

+ +

+ + +

+ +

Train-One-Batch

For each SGD iteration, every worker calls the TrainOneBatch function to compute gradients of parameters associated with local layers (i.e., layers dispatched to it). SINGA has implemented two algorithms for the TrainOneBatch function. Users select the corresponding algorithm for their model in the configuration.

Basic user guide

Back-propagation

BP algorithm is used for computing gradients of feed-forward models, e.g., CNN and MLP, and RNN models in SINGA.

+ +

# in job.conf
+alg: kBP
+

To use the BP algorithm for the TrainOneBatch function, users just simply configure the alg field with kBP. If a neural net contains user-defined layers, these layers must be implemented properly be to consistent with the implementation of the BP algorithm in SINGA (see below).

Contrastive Divergence

CD algorithm is used for computing gradients of energy models like RBM.

+ +

# job.conf
+alg: kCD
+cd_conf {
+  cd_k: 2
+}
+

To use the CD algorithm for the TrainOneBatch function, users just configure the alg field to kCD. Uses can also configure the Gibbs sampling steps in the CD algorthm through the cd_k field. By default, it is set to 1.

Advanced user guide

Implementation of BP

The BP algorithm is implemented in SINGA following the below pseudo code,

+ +

BPTrainOnebatch(step, net) {
+  // forward propagate
+  foreach layer in net.local_layers() {
+    if IsBridgeDstLayer(layer)
+      recv data from the src layer (i.e., BridgeSrcLayer)
+    foreach param in layer.params()
+      Collect(param) // recv response from servers for last update
+
+    layer.ComputeFeature(kForward)
+
+    if IsBridgeSrcLayer(layer)
+      send layer.data_ to dst layer
+  }
+  // backward propagate
+  foreach layer in reverse(net.local_layers) {
+    if IsBridgeSrcLayer(layer)
+      recv gradient from the dst layer (i.e., BridgeDstLayer)
+      recv response from servers for last update
+
+    layer.ComputeGradient()
+    foreach param in layer.params()
+      Update(step, param) // send param.grad_ to servers
+
+    if IsBridgeDstLayer(layer)
+      send layer.grad_ to src layer
+  }
+}
+

It forwards features through all local layers (can be checked by layer partition ID and worker ID) and backwards gradients in the reverse order. BridgeSrcLayer (resp. BridgeDstLayer) will be blocked until the feature (resp. gradient) from the source (resp. destination) layer comes. Parameter gradients are sent to servers via Update function. Updated parameters are collected via Collect function, which will be blocked until the parameter is updated. Param objects have versions, which can be used to check whether the Param objects have been updated or not.

Since RNN models are unrolled into feed-forward models, users need to implement the forward propagation in the recurrent layer’s ComputeFeature function, and implement the backward propagation in the recurrent layer’s ComputeGradient function. As a result, the whole TrainOneBatch runs back-propagation through time (BPTT) algorithm.

Implementation of CD

The CD algorithm is implemented in SINGA following the below pseudo code,

+ +

CDTrainOneBatch(step, net) {
+  # positive phase
+  foreach layer in net.local_layers()
+    if IsBridgeDstLayer(layer)
+      recv positive phase data from the src layer (i.e., BridgeSrcLayer)
+    foreach param in layer.params()
+      Collect(param)  // recv response from servers for last update
+    layer.ComputeFeature(kPositive)
+    if IsBridgeSrcLayer(layer)
+      send positive phase data to dst layer
+
+  # negative phase
+  foreach gibbs in [0...layer_proto_.cd_k]
+    foreach layer in net.local_layers()
+      if IsBridgeDstLayer(layer)
+        recv negative phase data from the src layer (i.e., BridgeSrcLayer)
+      layer.ComputeFeature(kPositive)
+      if IsBridgeSrcLayer(layer)
+        send negative phase data to dst layer
+
+  foreach layer in net.local_layers()
+    layer.ComputeGradient()
+    foreach param in layer.params
+      Update(param)
+}
+

Parameter gradients are computed after the positive phase and negative phase.

Implementing a new algorithm

SINGA implements BP and CD by creating two subclasses of the Worker class: BPWorker’s TrainOneBatch function implements the BP algorithm; CDWorker’s TrainOneBatch function implements the CD algorithm. To implement a new algorithm for the TrainOneBatch function, users need to create a new subclass of the Worker, e.g.,

+ +

class FooWorker : public Worker {
+  void TrainOneBatch(int step, shared_ptr<NeuralNet> net, Metric* perf) override;
+  void TestOneBatch(int step, Phase phase, shared_ptr<NeuralNet> net, Metric* perf) override;
+};
+

The FooWorker must implement the above two functions for training one mini-batch and testing one mini-batch. The perf argument is for collecting training or testing performance, e.g., the objective loss or accuracy. It is passed to the ComputeFeature function of each layer.

Users can define some fields for users to configure

+ +

# in user.proto
+message FooWorkerProto {
+  optional int32 b = 1;
+}
+
+extend JobProto {
+  optional FooWorkerProto foo_conf = 101;
+}
+
+# in job.proto
+JobProto {
+  ...
+  extension 101..max;
+}
+

To use FooWorker, users need to register it in the main.cc and configure the alg and foo_conf fields,

+ +

# in main.cc
+const int kFoo = 3; // worker ID, must be different to that of CDWorker and BPWorker
+driver.RegisterWorker<FooWorker>(kFoo);
+
+# in job.conf
+...
+alg: 3
+[foo_conf] {
+  b = 4;
+}
+

+ +

+ + + + Added: websites/staging/singa/trunk/content/v0.2.0/kr/updater.html ============================================================================== --- websites/staging/singa/trunk/content/v0.2.0/kr/updater.html (added) +++ websites/staging/singa/trunk/content/v0.2.0/kr/updater.html Tue Apr 12 06:24:50 2016 @@ -0,0 +1,612 @@ + + + + + + + + + Apache SINGA – Updater + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

+ + + + + + + + + +

+ + +

+ +

+ + +

+ + + +

+ +

+ + +

+ +

Updater

Every server in SINGA has an Updater instance that updates parameters based on gradients. In this page, the Basic user guide describes the configuration of an updater. The Advanced user guide present details on how to implement a new updater and a new learning rate changing method.

Basic user guide

There are many different parameter updating protocols (i.e., subclasses of Updater). They share some configuration fields like

+ +

type, an integer for identifying an updater;
learning_rate, configuration for the LRGenerator which controls the learning rate.
weight_decay, the co-efficient for L2 * regularization.
momentum.

If you are not familiar with the above terms, you can get their meanings in this page provided by Karpathy.

Configuration of built-in updater classes

Updater

The base Updater implements the vanilla SGD algorithm. Its configuration type is kSGD. Users need to configure at least the learning_rate field. momentum and weight_decay are optional fields.

+ +

updater{
+  type: kSGD
+  momentum: float
+  weight_decay: float
+  learning_rate {
+    ...
+  }
+}
+

AdaGradUpdater

It inherits the base Updater to implement the AdaGrad algorithm. Its type is kAdaGrad. AdaGradUpdater is configured similar to Updater except that momentum is not used.

NesterovUpdater

It inherits the base Updater to implements the Nesterov (section 3.5) updating protocol. Its type is kNesterov. learning_rate and momentum must be configured. weight_decay is an optional configuration field.

RMSPropUpdater

It inherits the base Updater to implements the RMSProp algorithm proposed by Hinton(slide 29). Its type is kRMSProp.

+ +

updater {
+  type: kRMSProp
+  rmsprop_conf {
+   rho: float # [0,1]
+  }
+}
+

Configuration of learning rate

The learning_rate field is configured as,

+ +

learning_rate {
+  type: ChangeMethod
+  base_lr: float  # base/initial learning rate
+  ... # fields to a specific changing method
+}
+

The common fields include type and base_lr. SINGA provides the following ChangeMethods.

kFixed

The base_lr is used for all steps.

kLinear

The updater should be configured like

+ +

learning_rate {
+  base_lr:  float
+  linear_conf {
+    freq: int
+    final_lr: float
+  }
+}
+

Linear interpolation is used to change the learning rate,

+ +

lr = (1 - step / freq) * base_lr + (step / freq) * final_lr
+

kExponential

The udapter should be configured like

+ +

learning_rate {
+  base_lr: float
+  exponential_conf {
+    freq: int
+  }
+}
+

The learning rate for step is

+ +

lr = base_lr / 2^(step / freq)
+

kInverseT

The updater should be configured like

+ +

learning_rate {
+  base_lr: float
+  inverset_conf {
+    final_lr: float
+  }
+}
+

The learning rate for step is

+ +

lr = base_lr / (1 + step / final_lr)
+

kInverse

The updater should be configured like

+ +

learning_rate {
+  base_lr: float
+  inverse_conf {
+    gamma: float
+    pow: float
+  }
+}
+

The learning rate for step is

+ +

lr = base_lr * (1 + gamma * setp)^(-pow)
+

kStep

The updater should be configured like

+ +

learning_rate {
+  base_lr : float
+  step_conf {
+    change_freq: int
+    gamma: float
+  }
+}
+

The learning rate for step is

+ +

lr = base_lr * gamma^ (step / change_freq)
+

kFixedStep

The updater should be configured like

+ +

learning_rate {
+  fixedstep_conf {
+    step: int
+    step_lr: float
+
+    step: int
+    step_lr: float
+
+    ...
+  }
+}
+

Denote the i-th tuple as (step[i], step_lr[i]), then the learning rate for step is,

+ +

step_lr[k]
+

where step[k] is the smallest number that is larger than step.

Advanced user guide

Implementing a new Updater subclass

The base Updater class has one virtual function,

+ +

class Updater{
+ public:
+  virtual void Update(int step, Param* param, float grad_scale = 1.0f) = 0;
+
+ protected:
+  UpdaterProto proto_;
+  LRGenerator lr_gen_;
+};
+

It updates the values of the param based on its gradients. The step argument is for deciding the learning rate which may change through time (step). grad_scale scales the original gradient values. This function is called by servers once it receives all gradients for the same Param object.

To implement a new Updater subclass, users must override the Update function.

+ +

class FooUpdater : public Updater {
+  void Update(int step, Param* param, float grad_scale = 1.0f) override;
+};
+

Configuration of this new updater can be declared similar to that of a new layer,

+ +

# in user.proto
+FooUpdaterProto {
+  optional int32 c = 1;
+}
+
+extend UpdaterProto {
+  optional FooUpdaterProto fooupdater_conf= 101;
+}
+

The new updater should be registered in the main function

+ +

driver.RegisterUpdater<FooUpdater>("FooUpdater");
+

Users can then configure the job as

+ +

# in job.conf
+updater {
+  user_type: "FooUpdater"  # must use user_type with the same string identifier as the one used for registration
+  fooupdater_conf {
+    c : 20;
+  }
+}
+

Implementing a new LRGenerator subclass

The base LRGenerator is declared as,

+ +

virtual float Get(int step);
+

To implement a subclass, e.g., FooLRGen, users should declare it like

+ +

class FooLRGen : public LRGenerator {
+ public:
+  float Get(int step) override;
+};
+

Configuration of FooLRGen can be defined using a protocol message,

+ +

# in user.proto
+message FooLRProto {
+ ...
+}
+
+extend LRGenProto {
+  optional FooLRProto foolr_conf = 101;
+}
+

The configuration is then like,

+ +

learning_rate {
+  user_type : "FooLR" # must use user_type with the same string identifier as the one used for registration
+  base_lr: float
+  foolr_conf {
+    ...
+  }
+}
+

Users have to register this subclass in the main function,

+ +

  driver.RegisterLRGenerator<FooLRGen, std::string>("FooLR")
+

+ +

+ + + +