lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walter Underwood <>
Subject Re: Content based recommender using lucene/solr
Date Fri, 28 Jun 2013 18:07:56 GMT
More Like This already is kNN. It extracts features from the document (makes a query), and
runs that query against the collection.

If you want the items most similar to the current item, use MLT.


On Jun 28, 2013, at 11:02 AM, Luis Carlos Guerrero Covo wrote:

> Hey saikat, thanks for your suggestion. I've looked into mahout and other
> alternatives for computing k nearest neighbors. I would have to run a job
> and computer the k nearest neighbors and track them in the index for
> retrieval. I wanted to see if this was something I could do with lucene
> using lucene's scoring function and solr's morelikethis component. The job
> you specifically mention is for Item based recommendation which would
> require me to track the different items users have viewed. I'm looking for
> a content based approach where I would use a distance measure to establish
> how near items are (how similar) and have some kind of training phase to
> adjust weights.
> On Fri, Jun 28, 2013 at 12:42 PM, Saikat Kanjilal <>wrote:
>> Why not just use mahout to do this, there is an item similarity algorithm
>> in mahout that does exactly this :)
>> You can use mahout in distributed and non-distributed mode as well.
>>> From:
>>> Date: Fri, 28 Jun 2013 12:16:57 -0500
>>> Subject: Content based recommender using lucene/solr
>>> To:;
>>> Hi,
>>> I'm using lucene and solr right now in a production environment with an
>>> index of about a million docs. I'm working on a recommender that
>> basically
>>> would list the n most similar items to the user based on the current item
>>> he is viewing.
>>> I've been thinking of using solr/lucene since I already have all docs
>>> available and I want a quick version that can be deployed while we work
>> on
>>> a more robust recommender. How about overriding the default similarity so
>>> that it scores documents based on the euclidean distance of normalized
>> item
>>> attributes and then using a morelikethis component to pass in the
>>> attributes of the item for which I want to generate recommendations? I
>> know
>>> it has its issues like recomputing scores/normalization/weight
>> application
>>> at query time which could make this idea unfeasible/impractical. I'm at a
>>> very preliminary stage right now with this and would love some
>> suggestions
>>> from experienced users.
>>> thank you,
>>> Luis Guerrero
> -- 
> Luis Carlos Guerrero Covo
> M.S. Computer Engineering
> (57) 3183542047

Walter Underwood

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message