Hi Pedro,
these are just helper functions, you need to check the operator. In this
case, the function is the derivative as function of the *output*, which is
cheaper to compute:
y = log(1 + exp(x)) => dy/dx = 1/(1 + exp(x)) = 1  exp(y)
If you check all sorts of other ops, the same is the case. You need to
always check the code for the operator.
In any case, there are quite some unit tests, that would catch this, except
of course if people added functions after I did this, and
have not updated the unit tests.
Bye, Matthias
On Wed, Nov 21, 2018 at 12:52 AM Pedro Larroy <pedro.larroy.lists@gmail.com>
wrote:
> I bumped into the definition of the softrelu gradient:
>
>
> https://github.com/apache/incubatormxnet/blob/master/src/operator/mshadow_op.h#L170
>
> Which is defined as 1 exp(x)
>
> As we define the forward of the softrelu as the softplus function,
> shouldn't the gradient be the logistic function?
>
> Is my understanding that the gradient of the softrelu should go down
> to zero as Lim x > Inf Which is not the case with the above
> definition which goes to Inf as Lim x > Inf
>
> https://en.wikipedia.org/wiki/Rectifier_(neural_networks)
>
>
> Pedro.
>

Matthias Seeger
