mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: Methods for Naming Clusters
Date Mon, 04 Jan 2010 23:37:24 GMT
Fair enough.

I bet that, like many other situations in this field, it's highly
data-dependent.  For N x M matrices with d being the mean number of
nonempty
entries per row, if d is fairly small, and N and M grow to be very very
large,
the necessity for "densification" to take into account higher order
correlations
becomes more necessary.  For text, d may be small, but due to Zipf, there's
that meaty set of columns which has a pretty nice density (not stop-word
density,
but way denser than the (N/M)*d mean density), which might provide enough
support to do natural smoothing at one step of learning.

For web graphs and most social graphs, getting data beyond 2-degrees is
pretty critical for all of the lightly connected nodes (they don't see much
before
3 degrees).

Do you know of any nice relevance comparisons of un-projected vs. one-step
learning vs. random projection vs. svd?  For text or recommenders or
anything
else?

  -jake

On Mon, Jan 4, 2010 at 3:22 PM, Ted Dunning <ted.dunning@gmail.com> wrote:

> I agree with everything you say.
>
> Except,
>
> I have found that A'AR gives an awful lot of what you need and may even be
> better in some ways than a full SVD.
>
> The *assumption* that incorporating all n-th degree connections is better
> in
> terms of results is just that.  Whether it actually is better is a matter
> of
> conjecture and I certainly don't have really strong evidence either way.
> The random indexing community claims to have really good results.  The LSA
> community claims some good results, notably Netflix results.  My own
> experience in document classification is that the results with A'AR (what
> we
> used to call one-step learning) and SVD are really, really similar.
>
> On Mon, Jan 4, 2010 at 2:55 PM, Jake Mannix <jake.mannix@gmail.com> wrote:
>
> > When you notice that for text, ngrams like "software engineer" are now
> > considerably closer to "c++ developer" than to other ngrams, this gives
> you
> > information.  You don't get that information from a random projection.
> > You'll get some of that information from A'AR, because you get
> second-order
> > correlations, but then you're still losing all correlations beyond
> > second-order (and a true eigenvector is getting you the full infinite
> > series of correlations,
> > properly weighted).
> >
> > I mean, I guess you can use SVD purely for dimensional reduction, but
> like
> > you say, doing reduction can be done lots of other more efficient ways.
> >  Doing it with reduction which enhances co-occurrence relationships and
> > distorts
> > the metric to produce better clusters than when you started is something
> > that SVD, NMF, and LDA were designed for.
> >
> > Maybe I'm missing your point?
> >
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message