mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: Making Very Large-Scale Linear Algebraic Computations Possible Via Randomization
Date Mon, 28 Sep 2009 04:37:24 GMT

Let me know how it goes.  I would love to kibitz but can't code anything.

One thing that I did notice is that after computing y = AR, a critical step
is SVD of y.  This is best done, perhaps, by decomposing y'y since that is a
k x k matrix.

So ... if R is defined by a PRNG with well defined seeds for each row, then
you don't have to store it.  It is particularly nice if you can compute the
n-th random number from a particular seed at high speed.  If you can do
that, then multiplication by a sparse matrix is very clean.  You can
probably do that with most generators by building a moral equivalent of a
skip list ... just save the seed every few hundred elements.  Better to have
a generator that can be iterated more efficiently, but usable.  Note the
comments in the articles about how very poor generators work just fine.

Now in a MR setting, A will probably be split into row-wise patches.  y = AR
is the row-wise concatenation of these.  With the PRNG trick, each mapper
can have an implicit copy of R.  Otherwise, it will have to be distributed
as a side file to all mappers and a merge will be necessary.  But if we are
after y' y as well, we can compute this by taking outer products of each row
of y and summing the resulting k x k matrices.  This works wonderfully
because combiners and such can combine partial map outputs.  You probably
still have to store y itself, but with a few fancy keys and a multiple
output format, you should be good to go.

On Sun, Sep 27, 2009 at 9:03 PM, Jake Mannix <> wrote:

>  That shouldn't be too hard to Hadoop-ify.  I'll see if I can try it out
> while porting over the decomposer stuff.

Ted Dunning, CTO

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message