mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Methods for Naming Clusters
Date Mon, 04 Jan 2010 22:12:22 GMT
Btw... relative to the cost of decomposition, have you seen the recent spate
of articles on stochastic decomposition?  It can dramatically speed up LSA.

See http://arxiv.org/abs/0909.4061v1 for a good survey.  My guess is that
you don't even need to do the SVD and could just use a random projection
with a single power step (which is nearly equivalent to random indexing).

On Mon, Jan 4, 2010 at 11:57 AM, Dawid Weiss <dawid.weiss@gmail.com> wrote:

> We agree, it was just me explaining things vaguely. The bottom line
> is: a lot depends on what you're planning to do with the clusters and
> the methodology should be suitable to this.
>
> Dawid
>
>
> On Mon, Jan 4, 2010 at 8:53 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
> > I think I agree with this for clusters that are intended for human
> > consumption, but I am sure that I disagree with this if you are looking
> to
> > use the clusters internally for machine learning purposes.
> >
> > The basic idea for the latter is that the distances to a bunch of
> clusters
> > can be used as a description of a point.  This description in terms of
> > distances to cluster centroids can make some machine learning tasks
> vastly
> > easier.
> >
> > On Mon, Jan 4, 2010 at 11:44 AM, Dawid Weiss <dawid.weiss@gmail.com>
> wrote:
> >
> >> What's worse -- neither method is "better". We at Carrot2 have a
> >> strong feeling that clusters should be described properly in order to
> >> be useful, but one may argue that in many, many applications of
> >> clustering, the labels are _not_ important and just individual
> >> features of clusters (like keywords or even documents themselves) are
> >> enough.
> >>
> >
> >
> >
> > --
> > Ted Dunning, CTO
> > DeepDyve
> >
>



-- 
Ted Dunning, CTO
DeepDyve

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message