mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: Making Very Large-Scale Linear Algebraic Computations Possible Via Randomization
Date Tue, 29 Sep 2009 00:25:59 GMT
Yes, but y probably has about as much data as A had.  A is sparse, y is
dense, but they have about the same number of rows.  Y will have dozens to
hundreds of non-zero elements per row which is probably comparable to A.

This means that dealing with y'y implicitly at the cost of passing over y is
a major loss.  Better to go ahead and form y' y on the fly as y is formed,
do an in-memory eigenvector computation and then pass over y to get Q (in
the original algorithm notation).  Q is also quite large.

Forming Q' A can be done in a merge step similar to the production of AR to
get S which is wide rather than tall like y.  As such, it is probably better
to compute S' and proceed as before.  It isn't clear to me just off-hand how
this might be done with row-wise access to A, but that would be a key step..

On Mon, Sep 28, 2009 at 1:05 PM, Jake Mannix <> wrote:

> When decomposing y'y, I tend to use a similar trick to this, which is
> probably
> equivalent: to multiply matrix by vector (y'y)*u, just do the vector sum of
> y_i * ( over the rows y_i of y (in which case you store y, but
> never
> bother to store y'y - and just like the outer-product sum, the vector sum
> can
> be done partially in the combiners as well).   Of course, y'y is much
> smaller
> than y, so avoiding saving y'y is not really that much of a win - but or
> this
> algorithm, it seems you do probably need both y and y'y, in which case it
> might be nice to not have to ever really deal with y'y explicitly.

Ted Dunning, CTO

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message