mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Dunning (JIRA)" <j...@apache.org>
Subject [jira] Updated: (MAHOUT-228) Need sequential logistic regression implementation using SGD techniques
Date Fri, 25 Dec 2009 21:02:29 GMT

     [ https://issues.apache.org/jira/browse/MAHOUT-228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ted Dunning updated MAHOUT-228:
-------------------------------

    Attachment: r.csv
                logP.csv
                sgd.csv


I have been doing some testing on the training algorithm and there seems to be a glitch in
it.  The problem is that the prior gradient is strong enough that for lambda > really small,
the regularization zeros out all of the coefficients on every iteration.  Not good.

I will attach some sample data that I have been using for these experiments.  These reference
for these experiments was an optimization I did in R where I explicitly optimized a simple
example and got very plausible results.

For the R example, I used the following definition of the function to optimize:

{noformat}
f <- function(beta) {
    p = w(rowSums(x %*% matrix(beta, ncol=1)));
    r1 = -sum(y*log(p+(p==0))+(1-y)*log(1-p+(p==1))); 
    r2=lambda*sum(abs(beta)); 
    (r1+r2)
}

w <- function(x) {
    return(1/(1+exp(-x)))
}
{noformat}
Here beta is the coefficient vector, lambda sets the amount of regularization, x are the input
vectors one observation per row, y are the known categories for the rows of x, f is the combined
log likelihood (r1) and log prior (r2), and w is the logistic function.  I used an unsimplified
form for the overall logistic likelihood for simplicity.  Normally, a simpler form is used
of -sum(y - p), but I wanted to keep things straightforward.

The attached file sgd.csv contains the value of x.  The value of y is simply 30 0's followed
by 30 1's.  

Optimization was done using this:
{noformat}
lambda <- 0.1
beta.01 <- optim(beta,f, method="CG", control=list(maxit=10000))
lambda <- 1
beta.1 <- optim(beta,f, method="CG", control=list(maxit=10000))
lambda <- 10
beta.10 <- optim(beta,f, method="CG", control=list(maxit=10000))
{noformat}
The values for beta obtained are contained in the file r.csv and the log-MAP likelihoods are
in logP.csv

I will shortly add a patch that has my initial test in it.  This patch will contain these
test data files.  I will be working on this problem off and on over the next few days, but
any hints that anybody has are welcome.  My expectation is that there is a silly oversight
in my Java code.




> Need sequential logistic regression implementation using SGD techniques
> -----------------------------------------------------------------------
>
>                 Key: MAHOUT-228
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-228
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Classification
>            Reporter: Ted Dunning
>             Fix For: 0.3
>
>         Attachments: logP.csv, MAHOUT-228-1.patch, MAHOUT-228-2.patch, r.csv, sgd.csv
>
>
> Stochastic gradient descent (SGD) is often fast enough for highly scalable learning (see
Vowpal Wabbit, http://hunch.net/~vw/).
> I often need to have a logistic regression in Java as well, so that is a reasonable place
to start.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message