mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olivier Grisel (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAHOUT-228) Need sequential logistic regression implementation using SGD techniques
Date Tue, 19 Jan 2010 02:00:54 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802032#action_12802032
] 

Olivier Grisel commented on MAHOUT-228:
---------------------------------------

For the records: I am working adding more tests and debugging in the following branch (keps
in sync with the trunk) hosted on github:

  http://github.com/ogrisel/mahout/commits/MAHOUT-228

Fixed so far:
 - convergence issues (inconstency on the index of the 'missing' beta row)
 - make sure that L1 is sparsity inducing my apply eager post update regularization

Still TODO (independently of Ted's TODOs) - migh be splitted into specific jira issues:
 - test that highly redundant dataset can lean to very sparse models with L1 prior
 - an hadoop driver to do // extraction vector features of documents using the Randomizer
classes
 - an hadoop driver to do // cross validation and confusion matrix evaluation (along with
confidence interval)
 - an hadoop driver to perform hyperparameters grid search (lambda, priorfunc, learning rate,
...)
 - a sample hadoop driver to categorize wikipedia articles by country
 - profile it a bit


> Need sequential logistic regression implementation using SGD techniques
> -----------------------------------------------------------------------
>
>                 Key: MAHOUT-228
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-228
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Classification
>            Reporter: Ted Dunning
>             Fix For: 0.3
>
>         Attachments: logP.csv, MAHOUT-228-3.patch, r.csv, sgd-derivation.pdf, sgd-derivation.tex,
sgd.csv
>
>
> Stochastic gradient descent (SGD) is often fast enough for highly scalable learning (see
Vowpal Wabbit, http://hunch.net/~vw/).
> I often need to have a logistic regression in Java as well, so that is a reasonable place
to start.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message