singa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <>
Subject [GitHub] [incubator-singa] chrishkchris commented on a change in pull request #468: Distributted module
Date Thu, 01 Aug 2019 13:55:50 GMT
chrishkchris commented on a change in pull request #468: Distributted module

 File path: src/api/config.i
 @@ -0,0 +1,33 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// Pass in cmake configurations to swig
+#define USE_CUDA 1
+#define USE_CUDNN 1
+#define USE_OPENCL 0
+#define USE_PYTHON 1
+#define USE_MKLDNN 1
+#define USE_JAVA 0
+#define CUDNN_VERSION 7401
+// SINGA version
 Review comment:
   Updated on 1 August 2019:
   Concerning the above error, I found that there is a different between the implementation
of `class _BatchNorm2d(Operation):` in master branch and dist_new branch.
   In, both the master branch and dist_new branch has modified (or debugged) the
conv2d and batchnorm operator, but they modified it differently. Meanwhile, both conv2d in
the master branch and dist_new branch can train and reduce loss of mnist simple CNN, so there
is no big problem. However, the batch normalization is a much more complex case, because it
includes non-training variables that are running means and running variances.
   In the master branch, the running means and running variances (non-training variables)
are in the forward function: `def forward(self, x, scale, bias, running_mean, running_var):`
   When I run the code using the master branch dockerfile, the error is as follows:
   root@26c9db193eb0:~/incubator-singa/examples/autograd# python3
   Start intialization............
                    | 0/200 [00:00<?, ?it/s]
   Traceback (most recent call last):
     File "", line 249, in <module>
       for p, g in autograd.backward(loss):
     File "/root/incubator-singa/build/python/singa/", line 135, in backward
       % (len(op.src), len(dxs))
   AssertionError: the number of src ops (=5) and dx (=3) not match
   I think the error is because the running_mean and running_var are in the forward function
input arguments but are not training variables, so there are supposed to be three src ops
but finally found 5.
   Meanwhile, the dist_new branch has modified the batchnorm function (commit 2b3a857 by user
ubuntu on Apr14) by moving the input arguments running_mean and running_var into the initialization
   `def __init__(self, handle, running_mean, running_var, name=None):`
   `def forward(self, x, scale, bias):`
   This one can run successfully but I am not sure if it can train and reduce loss.
   Next, I will try training the resnet with real dataset to see if it can reduce the loss.

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

With regards,
Apache Git Services

View raw message