mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yexi Jiang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (MAHOUT-975) Bug in Gradient Machine - Computation of the gradient
Date Mon, 10 Jun 2013 20:08:20 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13679837#comment-13679837
] 

Yexi Jiang edited comment on MAHOUT-975 at 6/10/13 8:07 PM:
------------------------------------------------------------

[~smarthi] When I apply this patch, the source code cannot be compiled. One of the error is
that hiddenActivations cannot be resolved. Another error is that the class Functions.NEGATE
is misspelled as Function.NEGATE.




                
      was (Author: yxjiang):
    [~smarthi] When I apply this patch, the source code cannot be compiled. One of the error
is that hiddenActivations cannot be resolved. Another error is that the class Functions.NEGATE
is misspell as Function.NEGATE.


                  
> Bug in Gradient Machine  - Computation of the gradient
> ------------------------------------------------------
>
>                 Key: MAHOUT-975
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-975
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.7
>            Reporter: Christian Herta
>            Assignee: Ted Dunning
>             Fix For: 0.8
>
>         Attachments: GradientMachine.patch
>
>
> The initialisation to compute the gradient descent weight updates for the output units
should be wrong:
>  
> In the comment: "dy / dw is just w since  y = x' * w + b."
> This is wrong. dy/dw is x (ignoring the indices). The same initialisation is done in
the code.
> Check by using neural network terminology:
> The gradient machine is a specialized version of a multi layer perceptron (MLP).
> In a MLP the gradient for computing the "weight change" for the output units is:
> dE / dw_ij = dE / dz_i * dz_i / d_ij with z_i = sum_j (w_ij * a_j)
> here: i index of the output layer; j index of the hidden layer
> (d stands for the partial derivatives)
> here: z_i = a_i (no squashing in the output layer)
> with the special loss (cost function) is  E = 1 - a_g + a_b = 1 - z_g + z_b
> with
> g index of output unit with target value: +1 (positive class)
> b: random output unit with target value: 0
> =>
> dE / dw_gj = -dE/dz_g * dz_g/dw_gj = -1 * a_j (a_j: activity of the hidden unit
> j)
> dE / dw_bj = -dE/dz_b * dz_b/dw_bj = +1 * a_j (a_j: activity of the hidden unit
> j)
> That's the same if the comment would be correct:
> dy /dw = x (x is here the activation of the hidden unit) * (-1) for weights to
> the output unit with target value +1.
> ------------
> In neural network implementations it's common to compute the gradient
> numerically for a test of the implementation. This can be done by:
> dE/dw_ij = (E(w_ij + epsilon) -E(w_ij - epsilon) ) / (2* (epsilon))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message