Fair enough.
I bet that, like many other situations in this field, it's highly
datadependent. For N x M matrices with d being the mean number of
nonempty
entries per row, if d is fairly small, and N and M grow to be very very
large,
the necessity for "densification" to take into account higher order
correlations
becomes more necessary. For text, d may be small, but due to Zipf, there's
that meaty set of columns which has a pretty nice density (not stopword
density,
but way denser than the (N/M)*d mean density), which might provide enough
support to do natural smoothing at one step of learning.
For web graphs and most social graphs, getting data beyond 2degrees is
pretty critical for all of the lightly connected nodes (they don't see much
before
3 degrees).
Do you know of any nice relevance comparisons of unprojected vs. onestep
learning vs. random projection vs. svd? For text or recommenders or
anything
else?
jake
On Mon, Jan 4, 2010 at 3:22 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
> I agree with everything you say.
>
> Except,
>
> I have found that A'AR gives an awful lot of what you need and may even be
> better in some ways than a full SVD.
>
> The *assumption* that incorporating all nth degree connections is better
> in
> terms of results is just that. Whether it actually is better is a matter
> of
> conjecture and I certainly don't have really strong evidence either way.
> The random indexing community claims to have really good results. The LSA
> community claims some good results, notably Netflix results. My own
> experience in document classification is that the results with A'AR (what
> we
> used to call onestep learning) and SVD are really, really similar.
>
> On Mon, Jan 4, 2010 at 2:55 PM, Jake Mannix <jake.mannix@gmail.com> wrote:
>
> > When you notice that for text, ngrams like "software engineer" are now
> > considerably closer to "c++ developer" than to other ngrams, this gives
> you
> > information. You don't get that information from a random projection.
> > You'll get some of that information from A'AR, because you get
> secondorder
> > correlations, but then you're still losing all correlations beyond
> > secondorder (and a true eigenvector is getting you the full infinite
> > series of correlations,
> > properly weighted).
> >
> > I mean, I guess you can use SVD purely for dimensional reduction, but
> like
> > you say, doing reduction can be done lots of other more efficient ways.
> > Doing it with reduction which enhances cooccurrence relationships and
> > distorts
> > the metric to produce better clusters than when you started is something
> > that SVD, NMF, and LDA were designed for.
> >
> > Maybe I'm missing your point?
> >
>
>
>
> 
> Ted Dunning, CTO
> DeepDyve
>
