mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: modern logistic regression alternatives
Date Wed, 17 Aug 2011 17:18:09 GMT
The SGD family support L1 regularized logistic regression.  The
regularization is pluggable although there is a pending bug with L2 support.

The interior point methods that you suggest are typically fast for
non-scaling problems.  The problem is that they require the entire data-set
to fit in memory on a single machine.

SGD approaches are often much faster in spite of being first order instead
of second order techniques, especially when data exceeds memory size.  Speed
can be increased with SGD approaches if you have sparse and long-tail
feature sets by simple model averaging.

The Mahout logistic regression implementation in SGD uses dense matrices for
the coefficients, but is very happy to accept sparse inputs.  Typically, the
inputs are the result of feature hashed encoding which means that very, very
large input vocabularies are quite usable.

I also find the very simple nature of the SGD codes attractive.

But all that said, if you have some suggestions, especially with useful
code, please speak up.

On Wed, Aug 17, 2011 at 9:19 AM, Patrick Harrington <
> wrote:

> Hey Everyone,
> So I have a request that may have been fielded already but I felt compelled
> to inquire...
> Logistic regression is obviously a popular tool for classification.
>  However, when confronted with modern problems where, post initial feature
> selection, we are still confronted with 10^6-10^7 features, using straight
> up LR is inappropriate as the solution is most likely embedded in a sparse
> linear subspace.
> L1 logistic regression adds a L1-norm penalty on the regression
> coefficients such that, when penalized "enough" many coefficients are
> "thresholded to zero" resulting in a simpler classifier that tends to
> generalize to out of sample test data better than a full fit model.
> The interior point method with a preconditioned gradient newton step
> approximation of boyd et al is what I would call the "state of the art" of
> L1 LR.
> It also accepts data matrices in a sparse MatrixMarket format (most common
> sparse matrix compression) whereas the LR mahout implementation is a dense
> matrix.
> Are there any pushes to implement a L1-Logistic Regression solver in the
> mahout libraries?  Obviously any form of LR is serial in nature but certain
> operations within the newton step approximation for instance can be
> parallelized.
> Any thoughts or visions on moving in this direction are welcomed.
> Very Best,
> Patrick Harrington
> Patrick Harrington, Ph.D. | Sr. Data Scientist | OneRiot
> 1050 Walnut Street, Suite 202 | Boulder, CO 80302
> 303.938.3071 Direct | 517.881.0628 Cell  | 303.938.3060 Fax

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message