[ https://issues.apache.org/jira/browse/MAHOUT672?page=com.atlassian.jira.plugin.system.issuetabpanels:commenttabpanel&focusedCommentId=13134347#comment13134347
]
Jake Mannix commented on MAHOUT672:

> That's the Achilles' heel of much of distributed stuff in Mahout. I.e. space of iterations
(I feel) must be very close to O(1), otherwise it severely affects stuff. Even using side
information is not that painful it seems compared to iteration growth. That severely decreases
pragmatical use.
I think it's a bit extreme to say we need to have nearly O(1) Mapreduce passes to be useful.
Lots of iterative stuff requires quite a few passes before convergence (as you say: Lanczos
and LDA both fall into this realm), yet it's just the price you have to pay sometimes.
This may be similar.
Jonathan, what size inputs have you run this on, with what running time in comparison to the
other algorithms we have? From what I can see, this looks good to commit as well.
> Implementation of Conjugate Gradient for solving large linear systems
> 
>
> Key: MAHOUT672
> URL: https://issues.apache.org/jira/browse/MAHOUT672
> Project: Mahout
> Issue Type: New Feature
> Components: Math
> Affects Versions: 0.5
> Reporter: Jonathan Traupman
> Priority: Minor
> Fix For: 0.6
>
> Attachments: 0001MAHOUT672LSMRiterativelinearsolver.patch, 0001MAHOUT672LSMRiterativelinearsolver.patch,
MAHOUT672.patch, mahout672111023.patch, mahout672.patch
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> This patch contains an implementation of conjugate gradient, an iterative algorithm for
solving large linear systems. In particular, it is well suited for large sparse systems where
a traditional QR or Cholesky decomposition is infeasible. Conjugate gradient only works for
matrices that are square, symmetric, and positive definite (basically the same types where
Cholesky decomposition is applicable). Systems like these commonly occur in statistics and
machine learning problems (e.g. regression).
> Both a standard (in memory) solver and a distributed hadoopbased solver (basically the
standard solver run using a DistributedRowMatrix a la DistributedLanczosSolver) are included.
> There is already a version of this algorithm in taste package, but it doesn't operate
on standard mahout matrix/vector objects, nor does it implement a distributed version. I believe
this implementation will be more generically useful to the community than the specialized
one in taste.
> This implementation solves the following types of systems:
> Ax = b, where A is square, symmetric, and positive definite
> A'Ax = b where A is arbitrary but A'A is positive definite. Directly solving this system
is more efficient than computing A'A explicitly then solving.
> (A + lambda * I)x = b and (A'A + lambda * I)x = b, for systems where A or A'A is singular
and/or not full rank. This occurs commonly if A is large and sparse. Solving a system of this
form is used, for example, in ridge regression.
> In addition to the normal conjugate gradient solver, this implementation also handles
preconditioning, and has a sample Jacobi preconditioner included as an example. More work
will be needed to build more advanced preconditioners if desired.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
