lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luis Carlos Guerrero Covo <lcguerreroc...@gmail.com>
Subject Re: Content based recommender using lucene/solr
Date Fri, 28 Jun 2013 18:02:00 GMT
Hey saikat, thanks for your suggestion. I've looked into mahout and other
alternatives for computing k nearest neighbors. I would have to run a job
and computer the k nearest neighbors and track them in the index for
retrieval. I wanted to see if this was something I could do with lucene
using lucene's scoring function and solr's morelikethis component. The job
you specifically mention is for Item based recommendation which would
require me to track the different items users have viewed. I'm looking for
a content based approach where I would use a distance measure to establish
how near items are (how similar) and have some kind of training phase to
adjust weights.


On Fri, Jun 28, 2013 at 12:42 PM, Saikat Kanjilal <sxk1969@hotmail.com>wrote:

> Why not just use mahout to do this, there is an item similarity algorithm
> in mahout that does exactly this :)
>
>
> https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/hadoop/similarity/item/ItemSimilarityJob.html
>
> You can use mahout in distributed and non-distributed mode as well.
>
> > From: lcguerrerocovo@gmail.com
> > Date: Fri, 28 Jun 2013 12:16:57 -0500
> > Subject: Content based recommender using lucene/solr
> > To: solr-user@lucene.apache.org; java-user@lucene.apache.org
> >
> > Hi,
> >
> > I'm using lucene and solr right now in a production environment with an
> > index of about a million docs. I'm working on a recommender that
> basically
> > would list the n most similar items to the user based on the current item
> > he is viewing.
> >
> > I've been thinking of using solr/lucene since I already have all docs
> > available and I want a quick version that can be deployed while we work
> on
> > a more robust recommender. How about overriding the default similarity so
> > that it scores documents based on the euclidean distance of normalized
> item
> > attributes and then using a morelikethis component to pass in the
> > attributes of the item for which I want to generate recommendations? I
> know
> > it has its issues like recomputing scores/normalization/weight
> application
> > at query time which could make this idea unfeasible/impractical. I'm at a
> > very preliminary stage right now with this and would love some
> suggestions
> > from experienced users.
> >
> > thank you,
> >
> > Luis Guerrero
>
>



-- 
Luis Carlos Guerrero Covo
M.S. Computer Engineering
(57) 3183542047

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message