singa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [singa] chrishkchris opened a new pull request #566: SINGA-487 Add Sparsification Algorithm: Threshold Quantization
Date Thu, 05 Dec 2019 15:40:07 GMT
chrishkchris opened a new pull request #566: SINGA-487 Add Sparsification Algorithm: Threshold
Quantization
URL: https://github.com/apache/singa/pull/566
 
 
   This PR implements a simple sparsification scheme, we transfer only gradient value which
is greater than an absolute threshold value. When we make use of cuda thrust parallel algorithm
to convert the dense matrix into sparse matrix, the overhead is relatively low.
   
   Some reference papers for the Sparsification:
   [1] N. Strom. Scalable distributed dnn training using commodity gpu cloud computing. In
Proceedings of gpu cloud computing. In Proceedings of the InterSpeech 2015. International
Speech
   Communication Association (ISCA), September 2015.
   [2] A. F. Aji and K. Hea
   eld. Sparse communication for distributed gradient descent. In Proceedings of the 2017
Conference on Empirical Methods in Natural Language Processing (EMNLP 2017), pages 440{445.
Association for Computational Linguistics (ACL), September 2017.
   
   I have added an examples file sparsification_mnist.py to test the accuracy. The following
results is based on a 4 GPUs AWS instance g4.dn12xlarge of the GPU model T4. 
   
   ```
   ubuntu@ip-172-31-20-160:~/singa/examples/autograd$ python3 sparsification_mnist.py
   Starting Epoch 0:
   Training loss = 809.631958, training accuracy = 0.709352
   Evaluation accuracy = 0.905849, Elapsed Time = 1.251285s
   Starting Epoch 1:
   Training loss = 325.436279, training accuracy = 0.888906
   Evaluation accuracy = 0.936098, Elapsed Time = 0.882350s
   Starting Epoch 2:
   Training loss = 238.643738, training accuracy = 0.920106
   Evaluation accuracy = 0.952424, Elapsed Time = 0.847908s
   Starting Epoch 3:
   Training loss = 200.181030, training accuracy = 0.933377
   Evaluation accuracy = 0.947616, Elapsed Time = 0.839072s
   Starting Epoch 4:
   Training loss = 182.340820, training accuracy = 0.938969
   Evaluation accuracy = 0.962240, Elapsed Time = 0.836915s
   Starting Epoch 5:
   Training loss = 161.267120, training accuracy = 0.946615
   Evaluation accuracy = 0.970653, Elapsed Time = 0.839940s
   Starting Epoch 6:
   Training loss = 147.990921, training accuracy = 0.951356
   Evaluation accuracy = 0.970753, Elapsed Time = 0.842795s
   Starting Epoch 7:
   Training loss = 139.301285, training accuracy = 0.953626
   Evaluation accuracy = 0.973458, Elapsed Time = 0.842011s
   Starting Epoch 8:
   Training loss = 131.042053, training accuracy = 0.956564
   Evaluation accuracy = 0.963241, Elapsed Time = 0.840951s
   Starting Epoch 9:
   Training loss = 126.376511, training accuracy = 0.957732
   Evaluation accuracy = 0.967448, Elapsed Time = 0.841526s
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message