singa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [singa] chrishkchris commented on pull request #697: New Model Layer Operator API
Date Wed, 03 Jun 2020 06:19:12 GMT

chrishkchris commented on pull request #697:
URL: https://github.com/apache/singa/pull/697#issuecomment-637980576


   Concerning the error in evaluation accuracy. Now I train in single GPU, the PR branch has
bug, while dev branch is good .
   Note that cnn is good while resnet, xceptionnet has problem, so I think I suspect batchnorm...I
am not sure 
   
   ```
   1. this PR branch
   
   root@33804dcbc1c1:~/dcsysh/singa/examples/cnn# python3 train.py xceptionnet cifar10 --bs
16
   Starting Epoch 0:
   Training loss = 11979.167969, training accuracy = 0.159080
   Evaluation accuracy = 0.100000, Elapsed Time = 633.876963s
   Starting Epoch 1:
   Training loss = 7296.525879, training accuracy = 0.311760
   Evaluation accuracy = 0.100000, Elapsed Time = 634.936632s
   Starting Epoch 2:
   Training loss = 5394.903320, training accuracy = 0.453420
   Evaluation accuracy = 0.100000, Elapsed Time = 635.466069s
   
   root@33804dcbc1c1:~/dcsysh/singa/examples/cnn# python3 train.py resnet cifar10 --id 1 --bs
32
   Starting Epoch 0:
   Training loss = 2914.102539, training accuracy = 0.344330
   Evaluation accuracy = 0.100160, Elapsed Time = 305.759969s
   Starting Epoch 1:
   Training loss = 2065.130371, training accuracy = 0.523668
   Evaluation accuracy = 0.099860, Elapsed Time = 310.018232s
   Starting Epoch 2:
   Training loss = 1643.553833, training accuracy = 0.628781
   Evaluation accuracy = 0.100160, Elapsed Time = 310.691379s
   
   2. dev branch
   root@c414bea0e577:~/dcsysh/singa2/examples/cnn# python3 train.py resnet cifar10 --id 2
--bs 32
   Starting Epoch 0:
   Training loss = 2259.674561, training accuracy = 0.479713
   Evaluation accuracy = 0.635517, Elapsed Time = 168.079483s
   Starting Epoch 1:
   Training loss = 1466.185791, training accuracy = 0.669894
   Evaluation accuracy = 0.730068, Elapsed Time = 167.397288s
   Starting Epoch 2:
   Training loss = 1147.042358, training accuracy = 0.745018
   Evaluation accuracy = 0.775240, Elapsed Time = 166.614314s
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



Mime
View raw message