singa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [singa] chrishkchris commented on a change in pull request #651: Simply example APIs
Date Sat, 04 Apr 2020 14:56:34 GMT
chrishkchris commented on a change in pull request #651: Simply example APIs
URL: https://github.com/apache/singa/pull/651#discussion_r403479208
 
 

 ##########
 File path: examples/autograd/cifar10_multiprocess.py
 ##########
 @@ -17,28 +17,16 @@
 # under the License.
 #
 
-from singa import opt
 from resnet_cifar10 import *
 import multiprocessing
 
-
-def data_partition(dataset_x, dataset_y, rank_in_global, world_size):
-    data_per_rank = dataset_x.shape[0] // world_size
-    idx_start = rank_in_global * data_per_rank
-    idx_end = (rank_in_global + 1) * data_per_rank
-    return dataset_x[idx_start:idx_end], dataset_y[idx_start:idx_end]
-
-
 if __name__ == '__main__':
 
     # Generate a NCCL ID to be used for collective communication
     nccl_id = singa.NcclIdHolder()
 
-    sgd = opt.SGD(lr=0.005, momentum=0.9, weight_decay=1e-5)
-
-    gpu_per_node = 4
-    max_epoch = 100
-    batch_size = 32
+    # number of GPUs to be used
+    gpu_per_node = 2
 
 Review comment:
   I added command line argument
   
   ```
   root@877a3759b148:~/dcsysh/singa/examples/autograd# python3 mnist_multiprocess.py 2
   Starting Epoch 0:
   Training loss = 1029.400879, training accuracy = 0.628155
   Evaluation accuracy = 0.882712, Elapsed Time = 1.923681s
   Starting Epoch 1:
   Training loss = 367.118927, training accuracy = 0.874215
   Evaluation accuracy = 0.939804, Elapsed Time = 1.986267s
   Starting Epoch 2:
   Training loss = 270.464111, training accuracy = 0.908504
   Evaluation accuracy = 0.935497, Elapsed Time = 1.985472s
   Starting Epoch 3:
   Training loss = 215.307678, training accuracy = 0.927017
   Evaluation accuracy = 0.958333, Elapsed Time = 1.894766s
   Starting Epoch 4:
   Training loss = 179.509125, training accuracy = 0.940371
   Evaluation accuracy = 0.965244, Elapsed Time = 1.889032s
   Starting Epoch 5:
   Training loss = 157.717468, training accuracy = 0.947282
   Evaluation accuracy = 0.962640, Elapsed Time = 1.892571s
   Starting Epoch 6:
   Training loss = 143.918320, training accuracy = 0.952224
   Evaluation accuracy = 0.968950, Elapsed Time = 1.910212s
   Starting Epoch 7:
   Training loss = 134.035339, training accuracy = 0.954911
   Evaluation accuracy = 0.957332, Elapsed Time = 1.977645s
   Starting Epoch 8:
   Training loss = 121.463905, training accuracy = 0.959352
   Evaluation accuracy = 0.970252, Elapsed Time = 1.944209s
   Starting Epoch 9:
   Training loss = 112.466797, training accuracy = 0.962457
   Evaluation accuracy = 0.974259, Elapsed Time = 1.883536s
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message