mxnet-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [incubator-mxnet] zixuanweeei commented on issue #17086: [MKLDNN] RNN Op gradient computation is broken
Date Fri, 27 Dec 2019 07:43:41 GMT
zixuanweeei commented on issue #17086: [MKLDNN] RNN Op gradient computation is broken
URL: https://github.com/apache/incubator-mxnet/issues/17086#issuecomment-569213067
 
 
   Hi, @liuzh91 @szhengac. We have posted https://github.com/apache/incubator-mxnet/pull/17183
to fix the gradient explosion issue in RNN Backward. Thanks for reporting this issue again.
And it would be greatly appreciated if you could give a test on this patch. Thanks.
   
   BTW, we got the below training log:
   ```
   ❯ python word_language_model.py --log-interval=1
   /path/to/mxnet/python/mxnet/optimizer/optimizer.py:167: UserWarning: WARNING: New optimizer
gluonnlp.optimizer.lamb.LAMB is overriding existing optimizer mxnet.optimizer.optimizer.LAMB
     Optimizer.opt_registry[name].__name__))
   Namespace(alpha=2, batch_size=80, beta=1, bptt=70, clip=0.25, dropout=0.4, dropout_e=0.1,
dropout_h=0.2, dropout_i=0.65, emsize=400, epochs=750, eval_only=False, gpu=None, log_interval=1,
lr=30, lr_update_factor=0.1, lr_update_interval=30, model='lstm', nhid=1150, nlayers=3, ntasgd=False,
optimizer='sgd', save='model.params', test_mode=False, tied=False, wd=1.2e-06, weight_dropout=0.5)
   Use AWDRNN
   AWDRNN(
     (embedding): HybridSequential(
       (0): Embedding(33278 -> 400, float32)
       (1): Dropout(p = 0.65, axes=(0,))
     )
     (encoder): HybridSequential(
       (0): LSTM(400 -> 1150, TNC)
       (1): LSTM(1150 -> 1150, TNC)
       (2): LSTM(1150 -> 1150, TNC)
     )
     (decoder): HybridSequential(
       (0): Dense(None -> 33278, linear)
     )
   )
   [Epoch 0 Batch 1/372] current loss 20.50, ppl 796977445.38, throughput 18.37 samples/s,
lr 30.86
   [Epoch 0 Batch 2/372] current loss 9.51, ppl 13511.50, throughput 39.56 samples/s, lr 28.29
   [Epoch 0 Batch 3/372] current loss 17.53, ppl 41003388.51, throughput 40.65 samples/s,
lr 27.43
   [Epoch 0 Batch 4/372] current loss 9.45, ppl 12761.47, throughput 40.39 samples/s, lr 27.43
   [Epoch 0 Batch 5/372] current loss 14.34, ppl 1695623.66, throughput 35.59 samples/s, lr
31.71
   [Epoch 0 Batch 6/372] current loss 9.40, ppl 12113.46, throughput 35.10 samples/s, lr 32.14
   [Epoch 0 Batch 7/372] current loss 8.56, ppl 5232.00, throughput 37.62 samples/s, lr 30.00
   [Epoch 0 Batch 8/372] current loss 9.32, ppl 11163.67, throughput 42.00 samples/s, lr 26.57
   [Epoch 0 Batch 9/372] current loss 8.44, ppl 4642.37, throughput 61.95 samples/s, lr 17.14
   [Epoch 0 Batch 10/372] current loss 8.92, ppl 7494.76, throughput 41.39 samples/s, lr 27.00
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message