Hmm... the degree to which I've found SVD useful is primarily contained in
the amount to which the metric is *not* preserved, in my experience...
that's
the whole point, or else you get very little out of it: you trade a high
dimensional
sparse computation for a lowdimensional dense one, and if you exactly
preserved
the metric you basically get nothing.
When you notice that for text, ngrams like "software engineer" are now
considerably closer to "c++ developer" than to other ngrams, this gives you
information. You don't get that information from a random projection.
You'll
get some of that information from A'AR, because you get secondorder
correlations, but then you're still losing all correlations beyond
secondorder (and
a true eigenvector is getting you the full infinite series of correlations,
properly
weighted).
I mean, I guess you can use SVD purely for dimensional reduction, but like
you say, doing reduction can be done lots of other more efficient ways.
Doing
it with reduction which enhances cooccurrence relationships and distorts
the metric to produce better clusters than when you started is something
that
SVD, NMF, and LDA were designed for.
Maybe I'm missing your point?
jake
On Mon, Jan 4, 2010 at 2:44 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
> SVD is (approximately) metricpreserving while also dimensionality
> reducing. If you use A'AR instead of the actual term eigenvectors you
> should get similar results.
>
> On Mon, Jan 4, 2010 at 2:21 PM, Jake Mannix <jake.mannix@gmail.com> wrote:
>
> > Ted, how would just doing a random projection do the right thing? It's a
> > basically metricpreserving technique, and one of the primary reasons to
> > *do* LSA is to use a *different* metric (one in which "similar" terms are
> > nearer to each other than would be otherwise imagined).
> >
>
>
>
> 
> Ted Dunning, CTO
> DeepDyve
>
