I bumped into the definition of the softrelu gradient:
https://github.com/apache/incubatormxnet/blob/master/src/operator/mshadow_op.h#L170
Which is defined as 1 exp(x)
As we define the forward of the softrelu as the softplus function,
shouldn't the gradient be the logistic function?
Is my understanding that the gradient of the softrelu should go down
to zero as Lim x > Inf Which is not the case with the above
definition which goes to Inf as Lim x > Inf
https://en.wikipedia.org/wiki/Rectifier_(neural_networks)
Pedro.
