lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: Content based recommender using lucene/solr
Date Sun, 30 Jun 2013 00:50:39 GMT
Solr/Lucene has two features for this:
1) the MoreLikeThis code, and
2) the clustering project in solr/contrib.

Lance

On 06/28/2013 11:15 AM, Luis Carlos Guerrero Covo wrote:
> I only have about a million docs right now so scaling is not a big issue.
> I'm looking to provide a quick implementation and then worry about scale
> when I get around to implementing a more robust recommender. I'm looking at
> a content based approach because we are not tracking users and items viewed
> by users. I was thinking of using morelikethis like walter mentioned, but
> wanted some feedback on the nuances required for a proper implementation
> like having a similarity based on euclidean distance, normalizing numerical
> field values and computing collection wide stats like mean and variance.
> Thank you for the link Otis, I will watch it right away.
>
>
> On Fri, Jun 28, 2013 at 1:12 PM, Otis Gospodnetic <
> otis.gospodnetic@gmail.com> wrote:
>
>> Hi,
>>
>> It doesn't have to be one or the other.  In the past I've built a news
>> recommender engine based on CF (Mahout) and combined it with Content
>> Similarity-based engine (wasn't Solr/Lucene, but something custom that
>> worked with ngrams, but it may have as well been Lucene/Solr/ES).  It
>> worked well.  If you haven't worked with Mahout before I'd suggest the
>> approach in that video and going from there to Mahout only if it's
>> limiting.
>>
>> See Ted's stuff on this topic, too:
>> http://www.slideshare.net/tdunning/search-as-recommendation +
>> http://berlinbuzzwords.de/sessions/multi-modal-recommendation-algorithms
>> (note: Mahout, Solr, Pig)
>>
>> Otis
>> --
>> Solr & ElasticSearch Support -- http://sematext.com/
>> Performance Monitoring -- http://sematext.com/spm
>>
>>
>>
>> On Fri, Jun 28, 2013 at 2:07 PM, Saikat Kanjilal <sxk1969@hotmail.com>
>> wrote:
>>> You could build a custom recommender in mahout to accomplish this, also
>> just out of curiosity why the content based approach as opposed to building
>> a recommender based on co-occurence.  One other thing, what is your data
>> size, are you looking at scale where you need something like hadoop?
>>>> From: lcguerrerocovo@gmail.com
>>>> Date: Fri, 28 Jun 2013 13:02:00 -0500
>>>> Subject: Re: Content based recommender using lucene/solr
>>>> To: solr-user@lucene.apache.org
>>>> CC: java-user@lucene.apache.org
>>>>
>>>> Hey saikat, thanks for your suggestion. I've looked into mahout and
>> other
>>>> alternatives for computing k nearest neighbors. I would have to run a
>> job
>>>> and computer the k nearest neighbors and track them in the index for
>>>> retrieval. I wanted to see if this was something I could do with lucene
>>>> using lucene's scoring function and solr's morelikethis component. The
>> job
>>>> you specifically mention is for Item based recommendation which would
>>>> require me to track the different items users have viewed. I'm looking
>> for
>>>> a content based approach where I would use a distance measure to
>> establish
>>>> how near items are (how similar) and have some kind of training phase to
>>>> adjust weights.
>>>>
>>>>
>>>> On Fri, Jun 28, 2013 at 12:42 PM, Saikat Kanjilal <sxk1969@hotmail.com
>>> wrote:
>>>>> Why not just use mahout to do this, there is an item similarity
>> algorithm
>>>>> in mahout that does exactly this :)
>>>>>
>>>>>
>>>>>
>> https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/hadoop/similarity/item/ItemSimilarityJob.html
>>>>> You can use mahout in distributed and non-distributed mode as well.
>>>>>
>>>>>> From: lcguerrerocovo@gmail.com
>>>>>> Date: Fri, 28 Jun 2013 12:16:57 -0500
>>>>>> Subject: Content based recommender using lucene/solr
>>>>>> To: solr-user@lucene.apache.org; java-user@lucene.apache.org
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm using lucene and solr right now in a production environment
>> with an
>>>>>> index of about a million docs. I'm working on a recommender that
>>>>> basically
>>>>>> would list the n most similar items to the user based on the
>> current item
>>>>>> he is viewing.
>>>>>>
>>>>>> I've been thinking of using solr/lucene since I already have all
>> docs
>>>>>> available and I want a quick version that can be deployed while we
>> work
>>>>> on
>>>>>> a more robust recommender. How about overriding the default
>> similarity so
>>>>>> that it scores documents based on the euclidean distance of
>> normalized
>>>>> item
>>>>>> attributes and then using a morelikethis component to pass in the
>>>>>> attributes of the item for which I want to generate
>> recommendations? I
>>>>> know
>>>>>> it has its issues like recomputing scores/normalization/weight
>>>>> application
>>>>>> at query time which could make this idea unfeasible/impractical.
>> I'm at a
>>>>>> very preliminary stage right now with this and would love some
>>>>> suggestions
>>>>>> from experienced users.
>>>>>>
>>>>>> thank you,
>>>>>>
>>>>>> Luis Guerrero
>>>>>
>>>>
>>>>
>>>> --
>>>> Luis Carlos Guerrero Covo
>>>> M.S. Computer Engineering
>>>> (57) 3183542047
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message