singa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [singa] chrishkchris opened a new pull request #566: SINGA-487 Add Sparsification Algorithms
Date Wed, 18 Dec 2019 10:16:31 GMT
chrishkchris opened a new pull request #566: SINGA-487 Add Sparsification Algorithms
URL: https://github.com/apache/singa/pull/566
 
 
   This PR implements some sparsification schemes, we transfer only gradient elements which
are significant. When we make use of cuda thrust parallel algorithm to convert the dense array
into sparse array, the overhead is relatively low.
   
   It supports two mode, controlled by the flag topK:
   1. When topK is False, it transmits the gradient elements which are greater than an absolute
threshold value.
   2. When topK is True, it transmits the K largest gradient element, where K equals the total
number of elements multiplies the spars factor.  
   Moreover, there is a flag corr to use the local accumulate gradient for correction. The
flag is true by default, because it is common to use the local accumulate gradient correction
in sparsification.
   
   Some reference papers for the Sparsification:
   [1] N. Strom. Scalable distributed dnn training using commodity gpu cloud computing. In
Proceedings of gpu cloud computing. In Proceedings of the InterSpeech 2015. International
Speech
   Communication Association (ISCA), September 2015.
   [2] A. F. Aji and K. Hea
   eld. Sparse communication for distributed gradient descent. In Proceedings of the 2017
Conference on Empirical Methods in Natural Language Processing (EMNLP 2017), pages 440{445.
Association for Computational Linguistics (ACL), September 2017.
   
   I have added an examples file sparsification_mnist.py to test the accuracy. The following
results is based on a 8 GPUs AWS instance p2.x8large of the GPU model K80. 
   
   ```
   ubuntu@ip-172-31-18-216:~/singa/examples/autograd$ python3 sparsification_mnist.py
   Starting Epoch 0:
   Training loss = 1237.824951, training accuracy = 0.537627
   Evaluation accuracy = 0.831209, Elapsed Time = 1.364238s
   Starting Epoch 1:
   Training loss = 468.859161, training accuracy = 0.835053
   Evaluation accuracy = 0.931229, Elapsed Time = 0.687484s
   Starting Epoch 2:
   Training loss = 329.488220, training accuracy = 0.887604
   Evaluation accuracy = 0.949424, Elapsed Time = 0.713595s
   Starting Epoch 3:
   Training loss = 220.463303, training accuracy = 0.925731
   Evaluation accuracy = 0.955592, Elapsed Time = 0.686450s
   Starting Epoch 4:
   Training loss = 171.178146, training accuracy = 0.942141
   Evaluation accuracy = 0.961760, Elapsed Time = 0.686534s
   Starting Epoch 5:
   Training loss = 149.635681, training accuracy = 0.950237
   Evaluation accuracy = 0.974198, Elapsed Time = 0.686791s
   Starting Epoch 6:
   Training loss = 124.092453, training accuracy = 0.958300
   Evaluation accuracy = 0.973376, Elapsed Time = 0.686136s
   Starting Epoch 7:
   Training loss = 115.288582, training accuracy = 0.961205
   Evaluation accuracy = 0.968647, Elapsed Time = 0.686174s
   Starting Epoch 8:
   Training loss = 99.048584, training accuracy = 0.966864
   Evaluation accuracy = 0.981188, Elapsed Time = 0.685848s
   Starting Epoch 9:
   Training loss = 84.038574, training accuracy = 0.972239
   Evaluation accuracy = 0.981188, Elapsed Time = 0.685568s
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message