singa-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From wang...@apache.org
Subject [1/3] incubator-singa git commit: SINGA-395 Add documentation for autograd APIs
Date Fri, 23 Nov 2018 08:41:54 GMT
Repository: incubator-singa
Updated Branches:
  refs/heads/master 99bae0209 -> 3d688be4e


SINGA-395 Add documentation for autograd APIs

updated the doc page for autograd
1. fix some typos
2. change the xception net example to two simple examples


Project: http://git-wip-us.apache.org/repos/asf/incubator-singa/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-singa/commit/8143cf40
Tree: http://git-wip-us.apache.org/repos/asf/incubator-singa/tree/8143cf40
Diff: http://git-wip-us.apache.org/repos/asf/incubator-singa/diff/8143cf40

Branch: refs/heads/master
Commit: 8143cf4051ee5f141d875cba5a974ca417bc5848
Parents: 4a1b1e2
Author: zmeihui <cherry850330@gmail.com>
Authored: Sun Nov 18 14:56:35 2018 +0800
Committer: zmeihui <cherry850330@gmail.com>
Committed: Sun Nov 18 14:56:35 2018 +0800

----------------------------------------------------------------------
 doc/en/docs/autograd.md     | 146 ++++++++++++++++++++++++
 doc/en/docs/autograd_doc.md | 241 ---------------------------------------
 2 files changed, 146 insertions(+), 241 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-singa/blob/8143cf40/doc/en/docs/autograd.md
----------------------------------------------------------------------
diff --git a/doc/en/docs/autograd.md b/doc/en/docs/autograd.md
new file mode 100644
index 0000000..6070629
--- /dev/null
+++ b/doc/en/docs/autograd.md
@@ -0,0 +1,146 @@
+# Autograd in Singa
+
+There are two typical ways to implement autograd, via symbolic differentiation like [Theano](http://deeplearning.net/software/theano/index.html)
or reverse differentiation like [Pytorch](https://pytorch.org/docs/stable/notes/autograd.html).
Singa follows Pytorch way, which records the computation graph and apply the backward propagation
automatically after forward propagation. The autograd algorithm is explained in details [here](https://pytorch.org/docs/stable/notes/autograd.html).
We explain the relevant modules in Singa and give an example to illustrate the usage. 
+
+## Relevant Modules
+
+There are three classes involved in autograd, namely  `singa.tensor.Tensor` , `singa.autograd.Operation`,
and `singa.autograd.Layer`. In the rest of this article, we use tensor, operation and layer
to refer to an instance of the respective class.
+
+### Tensor
+
+Three attributes of Tensor are used by autograd, 
+-  `.creator` is an `Operation` instance. It records the operation that generates the Tensor
instance.
+-  `.requires_grad` is a boolean variable. It is used to indicate that the autograd algorithm
needs to compute the gradient of the tensor (i.e., the owner). For example, during backpropagation,
the gradients of the tensors for the weight matrix of a linear layer and the feature maps
of a convolution layer (not the bottom layer) should be computed.
+-  `.stores_grad` is a boolean variable. It is used to indicate that the gradient of the
owner tensor should be stored and output by the backward function. For example, the gradient
of the feature maps is computed during backpropagation, but is not included in the output
of the backward function. 
+
+Programmers can change `requires_grad` and `stores_grad` of a Tensor instance. For example,
if later is set to True, the corresponding gradient is included in the output of the backward
function. It should be noted that if `stores_grad` is True, then `requires_grad` must be true,
not vice versa.
+
+
+### Operation
+
+It takes one or more `Tensor` instances as input, and then outputs one or more `Tensor` instances.
For example, ReLU can be implemented as a specific Operation subclass. When an `Operation`
instance is called (after instantiation), the following two steps are executed:
+
+1. record the source operations, i.e., the `creator`s of the input tensors.    2. do calculation
by calling member function `.forward()`
+
+There are two member functions for forwarding and backwarding, i.e., `.forward()` and `.backward()`.
They take `Tensor.data` as inputs (the type is `CTensor`), and output `Ctensor`s. To add a
specific operation, subclass `operation` should implement their own `.forward()` and `.backward()`.
The `backward()` function is called by the `backward()` function of autograd automatically
during backward propogation to compute the gradients of inputs (according to the `require_grad`
field). 
+
+### Layer
+
+For those operations that require parameters, we package them into a new class, `Layer`.
For example, convolution operation is wrapped into a convolution layer. `Layer` manages (stores)
the parameters and calls the corresponding `Operation`s to implement the transformation.
+
+
+
+## Examples
+
+Multiple examples are provided in the [example folder](https://github.com/apache/incubator-singa/tree/master/examples/autograd).
We explain two representative examples here.
+
+### Operation only
+
+The following codes implement a MLP model using only Operation instances (no Layer instances).
+
+#### Import packages
+
+```
+from singa.tensor import Tensor
+from singa import autograd
+from singa import opt
+```
+
+#### Create weight matrix and bias vector
+
+The parameter tensors are created with both `requires_grad` and `stores_grad` set to True.
+
+```
+w0 = Tensor(shape=(2, 3), requires_grad=True, stores_grad=True)
+w0.gaussian(0.0, 0.1)
+b0 = Tensor(shape=(1, 3), requires_grad=True, stores_grad=True)
+b0.set_value(0.0)
+
+w1 = Tensor(shape=(3, 2), requires_grad=True, stores_grad=True)
+w1.gaussian(0.0, 0.1)
+b1 = Tensor(shape=(1, 2), requires_grad=True, stores_grad=True)
+b1.set_value(0.0)
+```
+
+#### Training
+```
+inputs = Tensor(data=data)  # data matrix
+target = Tensor(data=label) # label vector
+autograd.training = True    # for training
+sgd = opt.SGD(0.05)   # optimizer
+
+for i in range(10):
+    x = autograd.matmul(inputs, w0) # matrix multiplication
+    x = autograd.add_bias(x, b0)    # add the bias vector
+    x = autograd.relu(x)            # ReLU activation operation
+
+    x = autograd.matmul(x, w1)
+    x = autograd.add_bias(x, b1)
+    
+    loss = autograd.softmax_cross_entropy(x, target)
+    
+    for p, g in autograd.backward(loss):        
+        sgd.update(p, g)
+```
+
+
+### Operation + Layer
+
+The following [example](https://github.com/apache/incubator-singa/blob/master/examples/autograd/mnist_cnn.py)
implements a CNN model using layers provided by the autograd module.
+
+#### Create the layers
+
+```
+conv1 = autograd.Conv2d(1, 32, 3, padding=1, bias=False)
+bn1 = autograd.BatchNorm2d(32)
+pooling1 = autograd.MaxPool2d(3, 1, padding=1)
+conv21 = autograd.Conv2d(32, 16, 3, padding=1)
+conv22 = autograd.Conv2d(32, 16, 3, padding=1)
+bn2 = autograd.BatchNorm2d(32)
+linear = autograd.Linear(32 * 28 * 28, 10)    
+pooling2 = autograd.AvgPool2d(3, 1, padding=1)
+```
+
+#### Define the forward function
+
+The operations in the forward pass will be recorded automatically for backward propagation.
+
+```
+def forward(x, t):
+    # x is the input data (a batch of images)
+    # t the the label vector (a batch of integers)
+    y = conv1(x)           # Conv layer  
+    y = autograd.relu(y)   # ReLU operation
+    y = bn1(y)             # BN layer
+    y = pooling1(y)        # Pooling Layer
+    
+    # two parallel convolution layers
+    y1 = conv21(y)
+    y2 = conv22(y)
+    y = autograd.cat((y1, y2), 1)  # cat operation
+    y = autograd.relu(y)           # ReLU operation
+    y = bn2(y)
+    y = pooling2(y)
+
+    y = autograd.flatten(y)        # flatten operation
+    y = linear(y)                  # Linear layer
+    loss = autograd.softmax_cross_entropy(y, t)  # operation 
+    return loss, y
+```
+
+#### Training
+
+```
+autograd.training = True
+for epoch in range(epochs):
+    for i in range(batch_number):
+        inputs = tensor.Tensor(device=dev, data=x_train[
+                               i * batch_sz:(1 + i) * batch_sz], stores_grad=False)
+        targets = tensor.Tensor(device=dev, data=y_train[
+                                i * batch_sz:(1 + i) * batch_sz], requires_grad=False, stores_grad=False)
+
+        loss, y = forward(inputs, targets) # forward the net
+    
+        for p, gp in autograd.backward(loss):  # auto backward
+            sgd.update(p, gp)
+```

http://git-wip-us.apache.org/repos/asf/incubator-singa/blob/8143cf40/doc/en/docs/autograd_doc.md
----------------------------------------------------------------------
diff --git a/doc/en/docs/autograd_doc.md b/doc/en/docs/autograd_doc.md
deleted file mode 100644
index 1ae7833..0000000
--- a/doc/en/docs/autograd_doc.md
+++ /dev/null
@@ -1,241 +0,0 @@
-# singa.autograd
-
-This part will present an overview of how autograd works and give a simple example of neuron
network which is implemented by using autograd API. 
-## Autograd Mechanics
-To get clear about how autograd system works, we should understand three important abstracts
in this system, they are `singa.tensor.Tensor` , `singa.autograd.Operation`, and `singa.autograd.Layer`.
 For briefness, these three classes will be denoted as `tensor`, `operation`, and `layer`.
-### Tensor
-The class `tensor` has three attributes which are important in autograd system, they are
`.creator`, `.requires_grad`, and `.stores_grad`.
--  `tensor.creator` is an `operation` object. It records the particular `operation` which
generates the `tensor ` itself.
--  `.requires_grad` and `.stores_grad` are both boolean indicators. These two attributes
record whether a `tensor` needs gradients and whether gradients of a  `tensor` need to be
stored when do backpropagation. For example, output `tensor` of `Conv2d` needs gradient but
no need to store gradient. In contrast, parameter `tensor` of `Conv2d` not only require gradients
but also need to store gradients. For those input `tensor` of a network, e.g., a batch of
images, since it don't require gradient and don't need to store gradient, both of the two
indicators,  `.requires_grad` and `.stores_grad`, should be set as False.
-It should be noted that if `.stores_grad` is true, then `.requires_grad` must be true, not
vice versa.
-### Operation
-A `operation` takes one or more `tensor` as input, and then output one or more `tensor`.
when a  `operation` is called, mainly two processes happen:
-   1. record source of the `operaiton`. Those inputs `tensor` contain their `creator` information,
which are the source `operation` of current operation. Current `operation` keeps those information
in the attribute `.src`. The designed autograd engine can control backward flow according
to `operation.src`.
-     2. do calculation by calling member function `.forward()`
-
-The class `operation` has two important member functions, `.forward()` and `.backward()`.
These two functions take `tensor.data` as inputs, and output `Ctensor`, which is the same
type with `tensor.data`. To add a specific operation, subclass `operation` should implement
their own `.forward()` and `.backward()`.
-### Layer
-For those operations containing parameters, e.g., the weight or bias tensors, we package
them into a new class, `layer`. Users should initialize a `layer` before invoking it.
-When a `layer` is called, it will send inputs `tensor` together with the parameter `tensor`
to the corresponding operation to construct the computation graph. One layer may call multiple
operations. 
-## Python API
-## Example
-The following codes implement a Xception Net using autograd API. They can be found in source
code of SINGA at 
-  `incubator-singa/examples/autograd/xceptionnet.py`
-### 1.  Import packages
-```
-from singa import autograd
-from singa import tensor
-from singa import device
-from singa import opt
-
-import numpy as np
-from tqdm import trange
-```
-### 2. Create model
-Firstly, we create the basic module, named `Block`, which occurs repeatedly in Xception architecture.
The `Block` class consists of  `SeparableConv2d`, `ReLU`, `BatchNorm2d` and `MaxPool2d`. It
also has linear residual connections. 
-```
-class Block(autograd.Layer):
-
-    def __init__(self, in_filters, out_filters, reps, strides=1, padding=0, start_with_relu=True,
grow_first=True):
-        super(Block, self).__init__()
-
-        if out_filters != in_filters or strides != 1:
-            self.skip = autograd.Conv2d(in_filters, out_filters,
-                                        1, stride=strides, padding=padding, bias=False)
-            self.skipbn = autograd.BatchNorm2d(out_filters)
-        else:
-            self.skip = None
-
-        self.layers = []
-
-        filters = in_filters
-        if grow_first:
-            self.layers.append(autograd.ReLU())
-            self.layers.append(autograd.SeparableConv2d(in_filters, out_filters,
-                                                        3, stride=1, padding=1, bias=False))
-            self.layers.append(autograd.BatchNorm2d(out_filters))
-            filters = out_filters
-
-        for i in range(reps - 1):
-            self.layers.append(autograd.ReLU())
-            self.layers.append(autograd.SeparableConv2d(filters, filters,
-                                                        3, stride=1, padding=1, bias=False))
-            self.layers.append(autograd.BatchNorm2d(filters))
-
-        if not grow_first:
-            self.layers.append(autograd.ReLU())
-            self.layers.append(autograd.SeparableConv2d(in_filters, out_filters,
-                                                        3, stride=1, padding=1, bias=False))
-            self.layers.append(autograd.BatchNorm2d(out_filters))
-
-        if not start_with_relu:
-            self.layers = self.layers[1:]
-        else:
-            self.layers[0] = autograd.ReLU()
-
-        if strides != 1:
-            self.layers.append(autograd.MaxPool2d(3, strides, padding + 1))
-
-    def __call__(self, x):
-        y = self.layers[0](x)
-        for layer in self.layers[1:]:
-            if isinstance(y, tuple):
-                y = y[0]
-            y = layer(y)
-
-        if self.skip is not None:
-            skip = self.skip(x)
-            skip = self.skipbn(skip)
-        else:
-            skip = x
-        y = autograd.add(y, skip)
-        return y
-```
-The second step is to build a `Xception` class. 
-When do initialization, we create all sublayers which containing parameters. 
-In member function `feature()`, we input a `tensor`, which contains information of training
data(images), then `feature()` will output their representations. Those extracted features
will then be sent to `logits` function to do classification. 
-```
-class Xception(autograd.Layer):
-    """
-    Xception optimized for the ImageNet dataset, as specified in
-    https://arxiv.org/pdf/1610.02357.pdf
-    """
-
-    def __init__(self, num_classes=1000):
-        """ Constructor
-        Args:
-            num_classes: number of classes
-        """
-        super(Xception, self).__init__()
-        self.num_classes = num_classes
-
-        self.conv1 = autograd.Conv2d(3, 32, 3, 2, 0, bias=False)
-        self.bn1 = autograd.BatchNorm2d(32)
-
-        self.conv2 = autograd.Conv2d(32, 64, 3, 1, 1, bias=False)
-        self.bn2 = autograd.BatchNorm2d(64)
-
-        self.block1 = Block(
-            64, 128, 2, 2, padding=0, start_with_relu=False, grow_first=True)
-        self.block2 = Block(
-            128, 256, 2, 2, padding=0, start_with_relu=True, grow_first=True)
-        self.block3 = Block(
-            256, 728, 2, 2, padding=0, start_with_relu=True, grow_first=True)
-
-        self.block4 = Block(
-            728, 728, 3, 1, start_with_relu=True, grow_first=True)
-        self.block5 = Block(
-            728, 728, 3, 1, start_with_relu=True, grow_first=True)
-        self.block6 = Block(
-            728, 728, 3, 1, start_with_relu=True, grow_first=True)
-        self.block7 = Block(
-            728, 728, 3, 1, start_with_relu=True, grow_first=True)
-
-        self.block8 = Block(
-            728, 728, 3, 1, start_with_relu=True, grow_first=True)
-        self.block9 = Block(
-            728, 728, 3, 1, start_with_relu=True, grow_first=True)
-        self.block10 = Block(
-            728, 728, 3, 1, start_with_relu=True, grow_first=True)
-        self.block11 = Block(
-            728, 728, 3, 1, start_with_relu=True, grow_first=True)
-
-        self.block12 = Block(
-            728, 1024, 2, 2, start_with_relu=True, grow_first=False)
-
-        self.conv3 = autograd.SeparableConv2d(1024, 1536, 3, 1, 1)
-        self.bn3 = autograd.BatchNorm2d(1536)
-
-        # do relu here
-        self.conv4 = autograd.SeparableConv2d(1536, 2048, 3, 1, 1)
-        self.bn4 = autograd.BatchNorm2d(2048)
-
-        self.globalpooling = autograd.MaxPool2d(10, 1)
-        self.fc = autograd.Linear(2048, num_classes)
-
-    def features(self, input):
-        x = self.conv1(input)
-        x = self.bn1(x)
-        x = autograd.relu(x)
-
-        x = self.conv2(x)
-        x = self.bn2(x)
-        x = autograd.relu(x)
-
-        x = self.block1(x)
-        x = self.block2(x)
-        x = self.block3(x)
-        x = self.block4(x)
-        x = self.block5(x)
-        x = self.block6(x)
-        x = self.block7(x)
-        x = self.block8(x)
-        x = self.block9(x)
-        x = self.block10(x)
-        x = self.block11(x)
-        x = self.block12(x)
-
-        x = self.conv3(x)
-        x = self.bn3(x)
-        x = autograd.relu(x)
-
-        x = self.conv4(x)
-        x = self.bn4(x)
-        return x
-
-    def logits(self, features):
-        x = autograd.relu(features)
-        x = self.globalpooling(x)
-        x = autograd.flatten(x)
-        x = self.fc(x)
-        return x
-
-    def __call__(self, input):
-        x = self.features(input)
-        x = self.logits(x)
-        return x
-```
-
-We can create a Xception Net by the following command:
-
-`model = Xception(num_classes=1000)`
-
-### 3. Sample data
-Sampling virtual images and labels by numpy.random.
-Those virtual images are in shape (3, 299, 299).
-The training batch size is set as 16.
-To transfer information from numpy array to SINGA `tensor`, We should firstly create SINGA
`tensor`, e.g., tx and ty,  then call their member function `copy_from_numpy`.
-```
-IMG_SIZE = 299
-batch_size = 16
-tx = tensor.Tensor((batch_size, 3, IMG_SIZE, IMG_SIZE), dev)
-ty = tensor.Tensor((batch_size,), dev, tensor.int32)
-x = np.random.randn(batch_size, 3, IMG_SIZE, IMG_SIZE).astype(np.float32)
-y = np.random.randint(0, 1000, batch_size, dtype=np.int32)
-tx.copy_from_numpy(x)
-ty.copy_from_numpy(y)
-```
-
-### 4. Set learning parameters and create optimizer
-The number of iterations is set as 20 while optimizer is chosen as SGD with learning rate=0.1,
momentum=0.9 and weight_decay=1e-5.
-```
-niters = 20
-sgd = opt.SGD(lr=0.1, momentum=0.9, weight_decay=1e-5)
-```
-### 5. Train model
-Set `autograd.training` as true:
-`autograd.training = True`
-
-Then start training:
-```
-with trange(niters) as t:
-        for b in t:
-            x = model(tx)
-            loss = autograd.softmax_cross_entropy(x, ty)
-            for p, g in autograd.backward(loss):
-                sgd.update(p, g)
-```
- 
-
-
-


Mime
View raw message