mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yexi Jiang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAHOUT-975) Bug in Gradient Machine - Computation of the gradient
Date Tue, 11 Jun 2013 21:02:20 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13680673#comment-13680673
] 

Yexi Jiang commented on MAHOUT-975:
-----------------------------------

The size of goodLabels in updateRanking is always 1 and it seems that there is no need to
use a loop.
Also, the existing test case cannot been passed. The ArrayIndexOutOfBoundsException are thrown.
                
> Bug in Gradient Machine  - Computation of the gradient
> ------------------------------------------------------
>
>                 Key: MAHOUT-975
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-975
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.7
>            Reporter: Christian Herta
>            Assignee: Ted Dunning
>             Fix For: Backlog
>
>         Attachments: GradientMachine2.java, GradientMachine.patch, MAHOUT-975.patch
>
>
> The initialisation to compute the gradient descent weight updates for the output units
should be wrong:
>  
> In the comment: "dy / dw is just w since  y = x' * w + b."
> This is wrong. dy/dw is x (ignoring the indices). The same initialisation is done in
the code.
> Check by using neural network terminology:
> The gradient machine is a specialized version of a multi layer perceptron (MLP).
> In a MLP the gradient for computing the "weight change" for the output units is:
> dE / dw_ij = dE / dz_i * dz_i / d_ij with z_i = sum_j (w_ij * a_j)
> here: i index of the output layer; j index of the hidden layer
> (d stands for the partial derivatives)
> here: z_i = a_i (no squashing in the output layer)
> with the special loss (cost function) is  E = 1 - a_g + a_b = 1 - z_g + z_b
> with
> g index of output unit with target value: +1 (positive class)
> b: random output unit with target value: 0
> =>
> dE / dw_gj = -dE/dz_g * dz_g/dw_gj = -1 * a_j (a_j: activity of the hidden unit
> j)
> dE / dw_bj = -dE/dz_b * dz_b/dw_bj = +1 * a_j (a_j: activity of the hidden unit
> j)
> That's the same if the comment would be correct:
> dy /dw = x (x is here the activation of the hidden unit) * (-1) for weights to
> the output unit with target value +1.
> ------------
> In neural network implementations it's common to compute the gradient
> numerically for a test of the implementation. This can be done by:
> dE/dw_ij = (E(w_ij + epsilon) -E(w_ij - epsilon) ) / (2* (epsilon))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message