singa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [singa] chrishkchris edited a comment on issue #591: Dev branch cpu training problem (with conv and pool)
Date Tue, 11 Feb 2020 06:49:28 GMT
chrishkchris edited a comment on issue #591: Dev branch cpu training problem (with conv and
pool)
URL: https://github.com/apache/singa/issues/591#issuecomment-584485839
 
 
   I have tried using GCC OpenMP and Intel TBB (threading building block) when complile DNNL
from source.
   
   The time is extremely slow (normal time per epoch should be around a minute), but the training
loss results are correct.
   
   1. GCC OpenMP
   
   ```
   root@3edb30e30b08:~/dcsysh/singa/examples/autograd# python3 mnist_cnn.py
   Starting Epoch 0:
   Training loss = 564.547180, training accuracy = 0.800644
   Evaluation accuracy = 0.931591, Elapsed Time = 1348.363244s
   Starting Epoch 1:
   Training loss = 229.964905, training accuracy = 0.922892
   Evaluation accuracy = 0.959535, Elapsed Time = 1344.685418s
   Starting Epoch 2:
   Training loss = 163.646332, training accuracy = 0.944837
   Evaluation accuracy = 0.973758, Elapsed Time = 1346.530425s
   Starting Epoch 3:
   Training loss = 135.699615, training accuracy = 0.954526
   Evaluation accuracy = 0.970152, Elapsed Time = 1346.398193s
   Starting Epoch 4:
   Training loss = 115.944962, training accuracy = 0.962096
   Evaluation accuracy = 0.968750, Elapsed Time = 1349.933991s
   Starting Epoch 5:
   Training loss = 102.581963, training accuracy = 0.965548
   Evaluation accuracy = 0.976963, Elapsed Time = 1343.627475s
   Starting Epoch 6:
   Training loss = 91.995560, training accuracy = 0.969701
   Evaluation accuracy = 0.980168, Elapsed Time = 1345.709435s
   Starting Epoch 7:
   Training loss = 85.334785, training accuracy = 0.971051
   Evaluation accuracy = 0.977664, Elapsed Time = 1342.384448s
   Starting Epoch 8:
   Training loss = 81.609375, training accuracy = 0.972018
   Evaluation accuracy = 0.981571, Elapsed Time = 1345.214866s
   Starting Epoch 9:
   Training loss = 76.690147, training accuracy = 0.974203
   Evaluation accuracy = 0.977364, Elapsed Time = 1354.111479s
   
   ```
   2. TBB (threading building block)
   
   ```
   root@3edb30e30b08:~/dcsysh/singa/examples/autograd# python3 mnist_cnn.py
   Starting Epoch 0:
   Training loss = 566.089539, training accuracy = 0.800527
   Evaluation accuracy = 0.938201, Elapsed Time = 1571.624848s
   Starting Epoch 1:
   Training loss = 229.882874, training accuracy = 0.923192
   Evaluation accuracy = 0.957833, Elapsed Time = 1569.219801s
   Starting Epoch 2:
   Training loss = 164.734573, training accuracy = 0.945137
   Evaluation accuracy = 0.955929, Elapsed Time = 1567.359108s
   Starting Epoch 3:
   Training loss = 132.956802, training accuracy = 0.955310
   Evaluation accuracy = 0.968550, Elapsed Time = 1572.159664s
   Starting Epoch 4:
   Training loss = 117.263237, training accuracy = 0.960646
   Evaluation accuracy = 0.969151, Elapsed Time = 1570.090345s
   Starting Epoch 5:
   Training loss = 105.917274, training accuracy = 0.965115
   Evaluation accuracy = 0.978466, Elapsed Time = 1569.966338s
   Starting Epoch 6:
   Training loss = 93.056519, training accuracy = 0.968700
   Evaluation accuracy = 0.976362, Elapsed Time = 1571.289907s
   Starting Epoch 7:
   Training loss = 85.500954, training accuracy = 0.971101
   Evaluation accuracy = 0.981771, Elapsed Time = 1572.169596s
   ```
   
   3. The old mkldnn in master branch, results copied from PR https://github.com/apache/singa/pull/579
   
   ```
   ubuntu@ip-172-31-24-48:~/singa/examples/autograd$ python3 mnist_cnn.py
   Starting Epoch 0:
   Training loss = 585.431152, training accuracy = 0.791739
   Evaluation accuracy = 0.930088, Elapsed Time = 55.447133s
   Starting Epoch 1:
   Training loss = 232.831589, training accuracy = 0.922158
   Evaluation accuracy = 0.967949, Elapsed Time = 55.337850s
   Starting Epoch 2:
   Training loss = 166.067307, training accuracy = 0.945788
   Evaluation accuracy = 0.968550, Elapsed Time = 55.367847s
   Starting Epoch 3:
   Training loss = 136.865341, training accuracy = 0.954092
   Evaluation accuracy = 0.973357, Elapsed Time = 55.358584s
   Starting Epoch 4:
   Training loss = 118.813286, training accuracy = 0.960195
   Evaluation accuracy = 0.979567, Elapsed Time = 55.270505s
   Starting Epoch 5:
   Training loss = 106.185112, training accuracy = 0.964481
   Evaluation accuracy = 0.975962, Elapsed Time = 55.281344s
   Starting Epoch 6:
   Training loss = 94.444023, training accuracy = 0.968016
   Evaluation accuracy = 0.980970, Elapsed Time = 55.081426s
   Starting Epoch 7:
   Training loss = 88.213493, training accuracy = 0.970418
   Evaluation accuracy = 0.982873, Elapsed Time = 54.912524s
   Starting Epoch 8:
   Training loss = 81.126442, training accuracy = 0.972886
   Evaluation accuracy = 0.981470, Elapsed Time = 54.907317s
   Starting Epoch 9:
   Training loss = 77.790993, training accuracy = 0.974236
   Evaluation accuracy = 0.974159, Elapsed Time = 54.915229s
   ```
   So the dnnl is 300 times slower than the old mkldnn?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message