mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: Generating a Document Similarity Matrix
Date Tue, 08 Jun 2010 22:52:07 GMT
The code in mahout CF is doing that?  I don't think that's right, we don't
do anything that fancy right now, do we Sean?

  -jake

On Tue, Jun 8, 2010 at 3:39 PM, Sebastian Schelter
<ssc.open@googlemail.com>wrote:

> Hi Kris,
>
> actually the code to compute the item-to-item similarities in the
> collaborative filtering part of mahout (which at the first look seems to be
> a totally different problem than yours) is based on a paper that deals with
> computing the pairwise similarity of text documents in a very simple way.
> Maybe that  could be helpful to you:
>
> Elsayed et al: Pairwise Document Similarity in Large Collections with
> MapReduce
>
> http://www.umiacs.umd.edu/~jimmylin/publications/Elsayed_etal_ACL2008_short.pdf
> <
> http://www.umiacs.umd.edu/%7Ejimmylin/publications/Elsayed_etal_ACL2008_short.pdf
> >
>
> -sebastian
>
>
> 2010/6/8 Kris Jack <mrkrisjack@gmail.com>
>
> > Hi everyone,
> >
> > I currently use lucene's moreLikeThis function through solr to find
> > documents that are related to one another.  A single call, however, takes
> > around 4 seconds to complete and I would like to reduce this.  I got to
> > thinking that I might be able to use Mahout to generate a document
> > similarity matrix offline that could then be looked-up in real time for
> > serving.  Is this a reasonable use of Mahout?  If so, what functions will
> > generate a document similarity matrix?  Also, I would like to be able to
> > keep the text processing advantages provided through lucene so it would
> > help
> > if I could still use my lucene index.  If not, then could you recommend
> any
> > alternative solutions please?
> >
> > Many thanks,
> > Kris
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message