From commits-return-2416-archive-asf-public=cust-asf.ponee.io@singa.incubator.apache.org Fri Nov 23 09:42:00 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 2117A180660 for ; Fri, 23 Nov 2018 09:41:59 +0100 (CET) Received: (qmail 41659 invoked by uid 500); 23 Nov 2018 08:41:59 -0000 Mailing-List: contact commits-help@singa.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@singa.incubator.apache.org Delivered-To: mailing list commits@singa.incubator.apache.org Received: (qmail 41650 invoked by uid 99); 23 Nov 2018 08:41:59 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Nov 2018 08:41:59 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id BCEE5CCFDF for ; Fri, 23 Nov 2018 08:41:58 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -11.7 X-Spam-Level: X-Spam-Status: No, score=-11.7 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, PP_MIME_FAKE_ASCII_TEXT=0.001, RCVD_IN_DNSWL_HI=-5, SPF_PASS=-0.001, USER_IN_DEF_SPF_WL=-7.5] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id ThGtHbwoZ-rh for ; Fri, 23 Nov 2018 08:41:56 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with SMTP id C61DE5F624 for ; Fri, 23 Nov 2018 08:41:55 +0000 (UTC) Received: (qmail 41529 invoked by uid 99); 23 Nov 2018 08:41:55 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Nov 2018 08:41:55 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id DB571E124A; Fri, 23 Nov 2018 08:41:54 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: wangwei@apache.org To: commits@singa.incubator.apache.org Date: Fri, 23 Nov 2018 08:41:54 -0000 Message-Id: X-Mailer: ASF-Git Admin Mailer Subject: [1/3] incubator-singa git commit: SINGA-395 Add documentation for autograd APIs Repository: incubator-singa Updated Branches: refs/heads/master 99bae0209 -> 3d688be4e SINGA-395 Add documentation for autograd APIs updated the doc page for autograd 1. fix some typos 2. change the xception net example to two simple examples Project: http://git-wip-us.apache.org/repos/asf/incubator-singa/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-singa/commit/8143cf40 Tree: http://git-wip-us.apache.org/repos/asf/incubator-singa/tree/8143cf40 Diff: http://git-wip-us.apache.org/repos/asf/incubator-singa/diff/8143cf40 Branch: refs/heads/master Commit: 8143cf4051ee5f141d875cba5a974ca417bc5848 Parents: 4a1b1e2 Author: zmeihui Authored: Sun Nov 18 14:56:35 2018 +0800 Committer: zmeihui Committed: Sun Nov 18 14:56:35 2018 +0800 ---------------------------------------------------------------------- doc/en/docs/autograd.md | 146 ++++++++++++++++++++++++ doc/en/docs/autograd_doc.md | 241 --------------------------------------- 2 files changed, 146 insertions(+), 241 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-singa/blob/8143cf40/doc/en/docs/autograd.md ---------------------------------------------------------------------- diff --git a/doc/en/docs/autograd.md b/doc/en/docs/autograd.md new file mode 100644 index 0000000..6070629 --- /dev/null +++ b/doc/en/docs/autograd.md @@ -0,0 +1,146 @@ +# Autograd in Singa + +There are two typical ways to implement autograd, via symbolic differentiation like [Theano](http://deeplearning.net/software/theano/index.html) or reverse differentiation like [Pytorch](https://pytorch.org/docs/stable/notes/autograd.html). Singa follows Pytorch way, which records the computation graph and apply the backward propagation automatically after forward propagation. The autograd algorithm is explained in details [here](https://pytorch.org/docs/stable/notes/autograd.html). We explain the relevant modules in Singa and give an example to illustrate the usage. + +## Relevant Modules + +There are three classes involved in autograd, namely `singa.tensor.Tensor` , `singa.autograd.Operation`, and `singa.autograd.Layer`. In the rest of this article, we use tensor, operation and layer to refer to an instance of the respective class. + +### Tensor + +Three attributes of Tensor are used by autograd, +- `.creator` is an `Operation` instance. It records the operation that generates the Tensor instance. +- `.requires_grad` is a boolean variable. It is used to indicate that the autograd algorithm needs to compute the gradient of the tensor (i.e., the owner). For example, during backpropagation, the gradients of the tensors for the weight matrix of a linear layer and the feature maps of a convolution layer (not the bottom layer) should be computed. +- `.stores_grad` is a boolean variable. It is used to indicate that the gradient of the owner tensor should be stored and output by the backward function. For example, the gradient of the feature maps is computed during backpropagation, but is not included in the output of the backward function. + +Programmers can change `requires_grad` and `stores_grad` of a Tensor instance. For example, if later is set to True, the corresponding gradient is included in the output of the backward function. It should be noted that if `stores_grad` is True, then `requires_grad` must be true, not vice versa. + + +### Operation + +It takes one or more `Tensor` instances as input, and then outputs one or more `Tensor` instances. For example, ReLU can be implemented as a specific Operation subclass. When an `Operation` instance is called (after instantiation), the following two steps are executed: + +1. record the source operations, i.e., the `creator`s of the input tensors. 2. do calculation by calling member function `.forward()` + +There are two member functions for forwarding and backwarding, i.e., `.forward()` and `.backward()`. They take `Tensor.data` as inputs (the type is `CTensor`), and output `Ctensor`s. To add a specific operation, subclass `operation` should implement their own `.forward()` and `.backward()`. The `backward()` function is called by the `backward()` function of autograd automatically during backward propogation to compute the gradients of inputs (according to the `require_grad` field). + +### Layer + +For those operations that require parameters, we package them into a new class, `Layer`. For example, convolution operation is wrapped into a convolution layer. `Layer` manages (stores) the parameters and calls the corresponding `Operation`s to implement the transformation. + + + +## Examples + +Multiple examples are provided in the [example folder](https://github.com/apache/incubator-singa/tree/master/examples/autograd). We explain two representative examples here. + +### Operation only + +The following codes implement a MLP model using only Operation instances (no Layer instances). + +#### Import packages + +``` +from singa.tensor import Tensor +from singa import autograd +from singa import opt +``` + +#### Create weight matrix and bias vector + +The parameter tensors are created with both `requires_grad` and `stores_grad` set to True. + +``` +w0 = Tensor(shape=(2, 3), requires_grad=True, stores_grad=True) +w0.gaussian(0.0, 0.1) +b0 = Tensor(shape=(1, 3), requires_grad=True, stores_grad=True) +b0.set_value(0.0) + +w1 = Tensor(shape=(3, 2), requires_grad=True, stores_grad=True) +w1.gaussian(0.0, 0.1) +b1 = Tensor(shape=(1, 2), requires_grad=True, stores_grad=True) +b1.set_value(0.0) +``` + +#### Training +``` +inputs = Tensor(data=data) # data matrix +target = Tensor(data=label) # label vector +autograd.training = True # for training +sgd = opt.SGD(0.05) # optimizer + +for i in range(10): + x = autograd.matmul(inputs, w0) # matrix multiplication + x = autograd.add_bias(x, b0) # add the bias vector + x = autograd.relu(x) # ReLU activation operation + + x = autograd.matmul(x, w1) + x = autograd.add_bias(x, b1) + + loss = autograd.softmax_cross_entropy(x, target) + + for p, g in autograd.backward(loss): + sgd.update(p, g) +``` + + +### Operation + Layer + +The following [example](https://github.com/apache/incubator-singa/blob/master/examples/autograd/mnist_cnn.py) implements a CNN model using layers provided by the autograd module. + +#### Create the layers + +``` +conv1 = autograd.Conv2d(1, 32, 3, padding=1, bias=False) +bn1 = autograd.BatchNorm2d(32) +pooling1 = autograd.MaxPool2d(3, 1, padding=1) +conv21 = autograd.Conv2d(32, 16, 3, padding=1) +conv22 = autograd.Conv2d(32, 16, 3, padding=1) +bn2 = autograd.BatchNorm2d(32) +linear = autograd.Linear(32 * 28 * 28, 10) +pooling2 = autograd.AvgPool2d(3, 1, padding=1) +``` + +#### Define the forward function + +The operations in the forward pass will be recorded automatically for backward propagation. + +``` +def forward(x, t): + # x is the input data (a batch of images) + # t the the label vector (a batch of integers) + y = conv1(x) # Conv layer + y = autograd.relu(y) # ReLU operation + y = bn1(y) # BN layer + y = pooling1(y) # Pooling Layer + + # two parallel convolution layers + y1 = conv21(y) + y2 = conv22(y) + y = autograd.cat((y1, y2), 1) # cat operation + y = autograd.relu(y) # ReLU operation + y = bn2(y) + y = pooling2(y) + + y = autograd.flatten(y) # flatten operation + y = linear(y) # Linear layer + loss = autograd.softmax_cross_entropy(y, t) # operation + return loss, y +``` + +#### Training + +``` +autograd.training = True +for epoch in range(epochs): + for i in range(batch_number): + inputs = tensor.Tensor(device=dev, data=x_train[ + i * batch_sz:(1 + i) * batch_sz], stores_grad=False) + targets = tensor.Tensor(device=dev, data=y_train[ + i * batch_sz:(1 + i) * batch_sz], requires_grad=False, stores_grad=False) + + loss, y = forward(inputs, targets) # forward the net + + for p, gp in autograd.backward(loss): # auto backward + sgd.update(p, gp) +``` http://git-wip-us.apache.org/repos/asf/incubator-singa/blob/8143cf40/doc/en/docs/autograd_doc.md ---------------------------------------------------------------------- diff --git a/doc/en/docs/autograd_doc.md b/doc/en/docs/autograd_doc.md deleted file mode 100644 index 1ae7833..0000000 --- a/doc/en/docs/autograd_doc.md +++ /dev/null @@ -1,241 +0,0 @@ -# singa.autograd - -This part will present an overview of how autograd works and give a simple example of neuron network which is implemented by using autograd API. -## Autograd Mechanics -To get clear about how autograd system works, we should understand three important abstracts in this system, they are `singa.tensor.Tensor` , `singa.autograd.Operation`, and `singa.autograd.Layer`. For briefness, these three classes will be denoted as `tensor`, `operation`, and `layer`. -### Tensor -The class `tensor` has three attributes which are important in autograd system, they are `.creator`, `.requires_grad`, and `.stores_grad`. -- `tensor.creator` is an `operation` object. It records the particular `operation` which generates the `tensor ` itself. -- `.requires_grad` and `.stores_grad` are both boolean indicators. These two attributes record whether a `tensor` needs gradients and whether gradients of a `tensor` need to be stored when do backpropagation. For example, output `tensor` of `Conv2d` needs gradient but no need to store gradient. In contrast, parameter `tensor` of `Conv2d` not only require gradients but also need to store gradients. For those input `tensor` of a network, e.g., a batch of images, since it don't require gradient and don't need to store gradient, both of the two indicators, `.requires_grad` and `.stores_grad`, should be set as False. -It should be noted that if `.stores_grad` is true, then `.requires_grad` must be true, not vice versa. -### Operation -A `operation` takes one or more `tensor` as input, and then output one or more `tensor`. when a `operation` is called, mainly two processes happen: - 1. record source of the `operaiton`. Those inputs `tensor` contain their `creator` information, which are the source `operation` of current operation. Current `operation` keeps those information in the attribute `.src`. The designed autograd engine can control backward flow according to `operation.src`. - 2. do calculation by calling member function `.forward()` - -The class `operation` has two important member functions, `.forward()` and `.backward()`. These two functions take `tensor.data` as inputs, and output `Ctensor`, which is the same type with `tensor.data`. To add a specific operation, subclass `operation` should implement their own `.forward()` and `.backward()`. -### Layer -For those operations containing parameters, e.g., the weight or bias tensors, we package them into a new class, `layer`. Users should initialize a `layer` before invoking it. -When a `layer` is called, it will send inputs `tensor` together with the parameter `tensor` to the corresponding operation to construct the computation graph. One layer may call multiple operations. -## Python API -## Example -The following codes implement a Xception Net using autograd API. They can be found in source code of SINGA at - `incubator-singa/examples/autograd/xceptionnet.py` -### 1. Import packages -``` -from singa import autograd -from singa import tensor -from singa import device -from singa import opt - -import numpy as np -from tqdm import trange -``` -### 2. Create model -Firstly, we create the basic module, named `Block`, which occurs repeatedly in Xception architecture. The `Block` class consists of `SeparableConv2d`, `ReLU`, `BatchNorm2d` and `MaxPool2d`. It also has linear residual connections. -``` -class Block(autograd.Layer): - - def __init__(self, in_filters, out_filters, reps, strides=1, padding=0, start_with_relu=True, grow_first=True): - super(Block, self).__init__() - - if out_filters != in_filters or strides != 1: - self.skip = autograd.Conv2d(in_filters, out_filters, - 1, stride=strides, padding=padding, bias=False) - self.skipbn = autograd.BatchNorm2d(out_filters) - else: - self.skip = None - - self.layers = [] - - filters = in_filters - if grow_first: - self.layers.append(autograd.ReLU()) - self.layers.append(autograd.SeparableConv2d(in_filters, out_filters, - 3, stride=1, padding=1, bias=False)) - self.layers.append(autograd.BatchNorm2d(out_filters)) - filters = out_filters - - for i in range(reps - 1): - self.layers.append(autograd.ReLU()) - self.layers.append(autograd.SeparableConv2d(filters, filters, - 3, stride=1, padding=1, bias=False)) - self.layers.append(autograd.BatchNorm2d(filters)) - - if not grow_first: - self.layers.append(autograd.ReLU()) - self.layers.append(autograd.SeparableConv2d(in_filters, out_filters, - 3, stride=1, padding=1, bias=False)) - self.layers.append(autograd.BatchNorm2d(out_filters)) - - if not start_with_relu: - self.layers = self.layers[1:] - else: - self.layers[0] = autograd.ReLU() - - if strides != 1: - self.layers.append(autograd.MaxPool2d(3, strides, padding + 1)) - - def __call__(self, x): - y = self.layers[0](x) - for layer in self.layers[1:]: - if isinstance(y, tuple): - y = y[0] - y = layer(y) - - if self.skip is not None: - skip = self.skip(x) - skip = self.skipbn(skip) - else: - skip = x - y = autograd.add(y, skip) - return y -``` -The second step is to build a `Xception` class. -When do initialization, we create all sublayers which containing parameters. -In member function `feature()`, we input a `tensor`, which contains information of training data(images), then `feature()` will output their representations. Those extracted features will then be sent to `logits` function to do classification. -``` -class Xception(autograd.Layer): - """ - Xception optimized for the ImageNet dataset, as specified in - https://arxiv.org/pdf/1610.02357.pdf - """ - - def __init__(self, num_classes=1000): - """ Constructor - Args: - num_classes: number of classes - """ - super(Xception, self).__init__() - self.num_classes = num_classes - - self.conv1 = autograd.Conv2d(3, 32, 3, 2, 0, bias=False) - self.bn1 = autograd.BatchNorm2d(32) - - self.conv2 = autograd.Conv2d(32, 64, 3, 1, 1, bias=False) - self.bn2 = autograd.BatchNorm2d(64) - - self.block1 = Block( - 64, 128, 2, 2, padding=0, start_with_relu=False, grow_first=True) - self.block2 = Block( - 128, 256, 2, 2, padding=0, start_with_relu=True, grow_first=True) - self.block3 = Block( - 256, 728, 2, 2, padding=0, start_with_relu=True, grow_first=True) - - self.block4 = Block( - 728, 728, 3, 1, start_with_relu=True, grow_first=True) - self.block5 = Block( - 728, 728, 3, 1, start_with_relu=True, grow_first=True) - self.block6 = Block( - 728, 728, 3, 1, start_with_relu=True, grow_first=True) - self.block7 = Block( - 728, 728, 3, 1, start_with_relu=True, grow_first=True) - - self.block8 = Block( - 728, 728, 3, 1, start_with_relu=True, grow_first=True) - self.block9 = Block( - 728, 728, 3, 1, start_with_relu=True, grow_first=True) - self.block10 = Block( - 728, 728, 3, 1, start_with_relu=True, grow_first=True) - self.block11 = Block( - 728, 728, 3, 1, start_with_relu=True, grow_first=True) - - self.block12 = Block( - 728, 1024, 2, 2, start_with_relu=True, grow_first=False) - - self.conv3 = autograd.SeparableConv2d(1024, 1536, 3, 1, 1) - self.bn3 = autograd.BatchNorm2d(1536) - - # do relu here - self.conv4 = autograd.SeparableConv2d(1536, 2048, 3, 1, 1) - self.bn4 = autograd.BatchNorm2d(2048) - - self.globalpooling = autograd.MaxPool2d(10, 1) - self.fc = autograd.Linear(2048, num_classes) - - def features(self, input): - x = self.conv1(input) - x = self.bn1(x) - x = autograd.relu(x) - - x = self.conv2(x) - x = self.bn2(x) - x = autograd.relu(x) - - x = self.block1(x) - x = self.block2(x) - x = self.block3(x) - x = self.block4(x) - x = self.block5(x) - x = self.block6(x) - x = self.block7(x) - x = self.block8(x) - x = self.block9(x) - x = self.block10(x) - x = self.block11(x) - x = self.block12(x) - - x = self.conv3(x) - x = self.bn3(x) - x = autograd.relu(x) - - x = self.conv4(x) - x = self.bn4(x) - return x - - def logits(self, features): - x = autograd.relu(features) - x = self.globalpooling(x) - x = autograd.flatten(x) - x = self.fc(x) - return x - - def __call__(self, input): - x = self.features(input) - x = self.logits(x) - return x -``` - -We can create a Xception Net by the following command: - -`model = Xception(num_classes=1000)` - -### 3. Sample data -Sampling virtual images and labels by numpy.random. -Those virtual images are in shape (3, 299, 299). -The training batch size is set as 16. -To transfer information from numpy array to SINGA `tensor`, We should firstly create SINGA `tensor`, e.g., tx and ty, then call their member function `copy_from_numpy`. -``` -IMG_SIZE = 299 -batch_size = 16 -tx = tensor.Tensor((batch_size, 3, IMG_SIZE, IMG_SIZE), dev) -ty = tensor.Tensor((batch_size,), dev, tensor.int32) -x = np.random.randn(batch_size, 3, IMG_SIZE, IMG_SIZE).astype(np.float32) -y = np.random.randint(0, 1000, batch_size, dtype=np.int32) -tx.copy_from_numpy(x) -ty.copy_from_numpy(y) -``` - -### 4. Set learning parameters and create optimizer -The number of iterations is set as 20 while optimizer is chosen as SGD with learning rate=0.1, momentum=0.9 and weight_decay=1e-5. -``` -niters = 20 -sgd = opt.SGD(lr=0.1, momentum=0.9, weight_decay=1e-5) -``` -### 5. Train model -Set `autograd.training` as true: -`autograd.training = True` - -Then start training: -``` -with trange(niters) as t: - for b in t: - x = model(tx) - loss = autograd.softmax_cross_entropy(x, ty) - for p, g in autograd.backward(loss): - sgd.update(p, g) -``` - - - -