mxnet-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [incubator-mxnet] apeforest edited a comment on issue #15120: [bug] fix higher grad log
Date Thu, 06 Jun 2019 18:18:40 GMT
apeforest edited a comment on issue #15120: [bug] fix higher grad log 
URL: https://github.com/apache/incubator-mxnet/pull/15120#issuecomment-499608736
 
 
   @kshitij12345 I think it's because of the design of Python backward API in MXNet.  When
you specify `variables=x`, MXNet will only compute gradients for the input variables listed
in `variables`. I did some experiment to make proof of my points:
   
   As in your case 2:
   ```
   x_grad = autograd.grad(heads=y, variables=x, head_grads=y_grad, create_graph=True, retain_graph=True)[0]
   ```
   If you perform another backward on x_grad as `x_grad.backward(out_grad=head_grads_grads)`,
y_grad is not listed as input variable and therefore it's gradient is zero
   
   As in your case 1:
   ```
   x_grad = x_grad_mid * y_grad # Note
   x_grad.backward(out_grad=head_grad_grads)
   ```
   You implicitly made y_grad an input variable when calling backward on x_grad. And that
is why you will get values in y_grad.grad.
   
   I replaced the `backward()` method with an explicit `autograd.grad()` call, which should
call the same C++ backend function and result is different.
   
   case 1.1: if I do the following, I again don't get any values for y_grad because the output
only contains one gradient variable
   ```
   out_grad = autograd.grad(heads=x_grad, variables=x, head_grads=head_grad_grads, create_graph=False,
retain_graph=False)
   print(out_grad[0])   # values equals to expected_grad_grad
   ```
   
   case 1.2: I explicitly set y_grad as input variable, I then get the expected result as
in your case 1
   ```
   out_grad = autograd.grad(heads=x_grad, variables=[x, y_grad], head_grads=head_grad_grads,
create_graph=False, retain_graph=False)
   print(out_grad[0])   # value equals to expected_grad_grad
   print(out_grad[1])   # value equals to expected_heads_grad
   ```
   
   At this point, I am not sure if this is a bug because the backward API is designed differently
from PyTorch. If y_grad is not specified as part of the input variables that need to perform
gradient on, it will not get values assigned even if you write `y_grad.attach_grad()` to it.
This seems to be consistent from API spec. Also, given that the value `y_grad` does not have
real useful values, I also don't feel the necessity to store it. Please let me know if this
makes sense. Thanks a lot for your careful drawing and insightful discussion.
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message