mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: Using mahout to cluster terms in Lucene
Date Tue, 29 Sep 2009 21:15:03 GMT
Heh.  What Ted said, but longer-winded.

On Tue, Sep 29, 2009 at 2:13 PM, Ted Dunning <ted.dunning@gmail.com> wrote:

> Another way to do this through the back door is to transpose the document
> set so that you have a list of documents for each term.  Index this and
> cluster it just as if it were normal documents and you will have a form of
> term clustering.
>
> On Tue, Sep 29, 2009 at 1:05 PM, Grant Ingersoll <gsingers@apache.org
> >wrote:
>
> > The LDA implementation kind of clusters on terms to generate topics.  It
> > sounds like you want some co-occurrence analysis, I'm not sure that the
> > clustering algorithms are best for that, but perhaps others have insight.
> >  I could imagine doing this with HBase or Pig and just keeping a matrix
> > where each cell kept track of the number of times both terms appear in a
> > document (or even within some window in a document).
> >
> >
> >
> > On Sep 29, 2009, at 8:57 AM, Ole-Martin Mørk wrote:
> >
> >  Hi.
> >> I have been using org.apache.mahout.utils.vectors.lucene.Driver
> >> and org.apache.mahout.clustering.kmeans.KMeansDriver to cluster
> documents
> >> in
> >> our Lucene index and it works great! I am wondering though, is it
> possible
> >> to use Mahout to cluster terms?
> >>
> >> I want to cluster terms that often appear in the same documents.
> >>
> >> Thank you.
> >>
> >> --
> >> Ole-Martin Mørk
> >> http://twitter.com/olemartin
> >> http://flickr.com/olemartin
> >>
> >
> > --------------------------
> > Grant Ingersoll
> > http://www.lucidimagination.com/
> >
> > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> > Solr/Lucene:
> > http://www.lucidimagination.com/search
> >
> >
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message