szhengac commented on a change in pull request #10350: Fix Gluon Language Model Example URL: https://github.com/apache/incubator-mxnet/pull/10350#discussion_r178449611 ########## File path: example/gluon/word_language_model/train.py ########## @@ -159,19 +160,19 @@ def train(): hidden = detach(hidden) with autograd.record(): output, hidden = model(data, hidden) + # Here L is a vector of size batch_size * bptt size L = loss(output, target) + L = L / (args.bptt * args.batch_size) L.backward() grads = [p.grad(context) for p in model.collect_params().values()] - # Here gradient is for the whole batch. - # So we multiply max_norm by batch_size and bptt size to balance it. - gluon.utils.clip_global_norm(grads, args.clip * args.bptt * args.batch_size) + gluon.utils.clip_global_norm(grads, args.clip) - trainer.step(args.batch_size) + trainer.step(1) Review comment: Yes, the loss has been rescaled manually. Also, we should rescale the loss by batch_size * bptt instead. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org With regards, Apache Git Services