mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: Stochastic SVD
Date Tue, 23 Mar 2010 03:12:31 GMT
Actually, maybe what you were thinking (at least, what *I* am thinking) is
that you can indeed do it on one pass through the *original* data (ie you
can
get away with never keeping a handle on the original data itself), because
on the "one pass" through that data, you spit out MultipleOutputs - one
SequenceFile of the randomly projected data, which doesn't hit a reducer
at all, and a second output which is the outer product of those vectors
with themselves, which its a summing reducer.

In this sense, while you need to pass over the original data's *size*
(in terms of number of rows) a second time, if you want to consider
it data to be played with (instead of just "training" data for use on a
smaller subset or even totally different set), you don't need to pass
over the original entire data *set* ever again.

  -jake

On Mon, Mar 22, 2010 at 6:35 PM, Ted Dunning <ted.dunning@gmail.com> wrote:

> You are probably right.  I had a wild hare tromp through my thoughts the
> other day saying that one pass should be possible, but I can't reconstruct
> the details just now.
>
> On Mon, Mar 22, 2010 at 6:00 PM, Jake Mannix <jake.mannix@gmail.com>
> wrote:
>
> > I guess if you mean just do a random projection on the original data, you
> > can certainly do that in one pass, but that's random projection, not a
> > stochastic decomposition.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message