[ https://issues.apache.org/jira/browse/MAHOUT672?page=com.atlassian.jira.plugin.system.issuetabpanels:commenttabpanel&focusedCommentId=13021145#comment13021145
]
Ted Dunning commented on MAHOUT672:

{quote}
As for a linear regression implementation using CG compared to one using SGD, it would be
hard for me to reach any conclusions without comparing the two approaches head to head on
the same data. CG would probably gain some benefit from being easily parallelizable, but the
individual updates in SGD seem very fast and lightweight, so any speed advantage to CG would
probably only come up for truly massive datasets. The SGD implementation in your patch also
has a lot of regularization support that a simple CG implementation of LMS would lack (ridge
regression i.e. L2 regularization comes for free, but L1 is considerably harder). I'm also
unaware of how one would do the automatic validation/hyperparameter tuning using CG that your
SGD implementation does.
{quote}
The other big difference, btw, is that all of our parallel approaches require at least one
pass through the data. The SGD stuff can stop early and often only needs a small fraction
of the input to converge. That gives sublinear convergence time in terms of input size (which
sounds whacky, but is real). Any approach that needs to read the entire data set obviously
can't touch that scaling factor.
Offsetting this is the idea that if we don't need all the data for a given complexity of model,
then we probably don't want to stop but would rather just have a more complex model. This
is where the nonparametric approaches come in. The would give simple answers with small
inputs and more nuanced answers with large data.
> Implementation of Conjugate Gradient for solving large linear systems
> 
>
> Key: MAHOUT672
> URL: https://issues.apache.org/jira/browse/MAHOUT672
> Project: Mahout
> Issue Type: New Feature
> Components: Math
> Affects Versions: 0.5
> Reporter: Jonathan Traupman
> Priority: Minor
> Attachments: MAHOUT672.patch
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> This patch contains an implementation of conjugate gradient, an iterative algorithm for
solving large linear systems. In particular, it is well suited for large sparse systems where
a traditional QR or Cholesky decomposition is infeasible. Conjugate gradient only works for
matrices that are square, symmetric, and positive definite (basically the same types where
Cholesky decomposition is applicable). Systems like these commonly occur in statistics and
machine learning problems (e.g. regression).
> Both a standard (in memory) solver and a distributed hadoopbased solver (basically the
standard solver run using a DistributedRowMatrix a la DistributedLanczosSolver) are included.
> There is already a version of this algorithm in taste package, but it doesn't operate
on standard mahout matrix/vector objects, nor does it implement a distributed version. I believe
this implementation will be more generically useful to the community than the specialized
one in taste.
> This implementation solves the following types of systems:
> Ax = b, where A is square, symmetric, and positive definite
> A'Ax = b where A is arbitrary but A'A is positive definite. Directly solving this system
is more efficient than computing A'A explicitly then solving.
> (A + lambda * I)x = b and (A'A + lambda * I)x = b, for systems where A or A'A is singular
and/or not full rank. This occurs commonly if A is large and sparse. Solving a system of this
form is used, for example, in ridge regression.
> In addition to the normal conjugate gradient solver, this implementation also handles
preconditioning, and has a sample Jacobi preconditioner included as an example. More work
will be needed to build more advanced preconditioners if desired.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
