mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olivier Grisel (JIRA)" <>
Subject [jira] Commented: (MAHOUT-228) Need sequential logistic regression implementation using SGD techniques
Date Tue, 19 Jan 2010 02:00:54 GMT


Olivier Grisel commented on MAHOUT-228:

For the records: I am working adding more tests and debugging in the following branch (keps
in sync with the trunk) hosted on github:

Fixed so far:
 - convergence issues (inconstency on the index of the 'missing' beta row)
 - make sure that L1 is sparsity inducing my apply eager post update regularization

Still TODO (independently of Ted's TODOs) - migh be splitted into specific jira issues:
 - test that highly redundant dataset can lean to very sparse models with L1 prior
 - an hadoop driver to do // extraction vector features of documents using the Randomizer
 - an hadoop driver to do // cross validation and confusion matrix evaluation (along with
confidence interval)
 - an hadoop driver to perform hyperparameters grid search (lambda, priorfunc, learning rate,
 - a sample hadoop driver to categorize wikipedia articles by country
 - profile it a bit

> Need sequential logistic regression implementation using SGD techniques
> -----------------------------------------------------------------------
>                 Key: MAHOUT-228
>                 URL:
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Classification
>            Reporter: Ted Dunning
>             Fix For: 0.3
>         Attachments: logP.csv, MAHOUT-228-3.patch, r.csv, sgd-derivation.pdf, sgd-derivation.tex,
> Stochastic gradient descent (SGD) is often fast enough for highly scalable learning (see
Vowpal Wabbit,
> I often need to have a logistic regression in Java as well, so that is a reasonable place
to start.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message