singa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [singa] chrishkchris opened a new pull request #762: Fix training loss error
Date Thu, 09 Jul 2020 09:38:37 GMT

chrishkchris opened a new pull request #762:
URL: https://github.com/apache/singa/pull/762


   A fix of error in training loss, the expected loss I used for long time is appeared wrong
in the dev branch in distributed training,
   
   Before fix:
   ```
   root@64926e30597f:~/dcsysh/singa/examples/cnn# mpiexec -np 3 python3 train_mpi.py cnn mnist
-l 0.015
   Starting Epoch 0:
   Training loss = 867.269531, training accuracy = 0.682409
   Evaluation accuracy = 0.913662, Elapsed Time = 1.374367s
   Starting Epoch 1:
   Training loss = 312.582123, training accuracy = 0.893546
   Evaluation accuracy = 0.946014, Elapsed Time = 1.324747s
   Starting Epoch 2:
   Training loss = 223.973038, training accuracy = 0.924312
   Evaluation accuracy = 0.955629, Elapsed Time = 1.325152s
   Starting Epoch 3:
   Training loss = 176.310730, training accuracy = 0.939804
   Evaluation accuracy = 0.965645, Elapsed Time = 1.327019s
   Starting Epoch 4:
   Training loss = 146.806168, training accuracy = 0.950220
   Evaluation accuracy = 0.969451, Elapsed Time = 1.320603s
   Starting Epoch 5:
   Training loss = 124.658463, training accuracy = 0.958784
   Evaluation accuracy = 0.970653, Elapsed Time = 1.317975s
   Starting Epoch 6:
   Training loss = 112.322250, training accuracy = 0.962724
   Evaluation accuracy = 0.972857, Elapsed Time = 1.343767s
   Starting Epoch 7:
   Training loss = 102.903122, training accuracy = 0.965044
   Evaluation accuracy = 0.971254, Elapsed Time = 1.316032s
   Starting Epoch 8:
   Training loss = 96.206215, training accuracy = 0.967798
   Evaluation accuracy = 0.971354, Elapsed Time = 1.292748s
   Starting Epoch 9:
   Training loss = 90.059357, training accuracy = 0.969785
   Evaluation accuracy = 0.981170, Elapsed Time = 1.301958s
   ```
   
   After fix:
   root@64926e30597f:~/dcsysh/singa/examples/cnn# mpiexec -np 3 python3 train_mpi.py cnn mnist
-l 0.015
   ```
   Starting Epoch 0:
   Training loss = 653.234863, training accuracy = 0.767194
   Evaluation accuracy = 0.936498, Elapsed Time = 1.364626s
   Starting Epoch 1:
   Training loss = 245.488037, training accuracy = 0.917201
   Evaluation accuracy = 0.959435, Elapsed Time = 1.311175s
   Starting Epoch 2:
   Training loss = 174.001266, training accuracy = 0.941757
   Evaluation accuracy = 0.959736, Elapsed Time = 1.324813s
   Starting Epoch 3:
   Training loss = 141.203125, training accuracy = 0.953292
   Evaluation accuracy = 0.971054, Elapsed Time = 1.330215s
   Starting Epoch 4:
   Training loss = 119.192688, training accuracy = 0.959519
   Evaluation accuracy = 0.973758, Elapsed Time = 1.302892s
   Starting Epoch 5:
   Training loss = 107.171661, training accuracy = 0.964443
   Evaluation accuracy = 0.975761, Elapsed Time = 1.314337s
   Starting Epoch 6:
   Training loss = 97.575897, training accuracy = 0.966513
   Evaluation accuracy = 0.977764, Elapsed Time = 1.304296s
   Starting Epoch 7:
   Training loss = 89.828827, training accuracy = 0.970753
   Evaluation accuracy = 0.975561, Elapsed Time = 1.316111s
   Starting Epoch 8:
   Training loss = 84.263199, training accuracy = 0.972189
   Evaluation accuracy = 0.979868, Elapsed Time = 1.298452s
   Starting Epoch 9:
   Training loss = 78.318733, training accuracy = 0.974059
   Evaluation accuracy = 0.981370, Elapsed Time = 1.308062s
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



Mime
View raw message