mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Sokolov <msoko...@safaribooksonline.com>
Subject Re: Solr-recommender
Date Wed, 09 Oct 2013 19:54:31 GMT
On 10/9/13 3:08 PM, Pat Ferrel wrote:
> Solr uses cosine similarity for it's queries. The implementation on github uses Mahout
LLR for calculating the item-item similarity matrix but when you do the more-like-this query
at runtime Solr uses cosine. This can be fixed in Solr, not sure how much work.
It's not clear to me whether it's worth "fixing" this or not.  It would 
certainly complicate scoring calculations when mixing with traditional 
search terms.
>
> It sounds like you are doing item-item similarities for recommendations, not actually
calculating user-history based recs, is that true?
Yes that's true so far.  Our recommender system has the ability to 
provide recs based on user history, but we have not deployed this in our 
app yet.  My plan was simply to query based on all the items in the 
user's "basket" - not sure that this would require a different back 
end?  We're not at the moment considering user-user similarity measures.
>
> You bring up a point that we're finding. I'm not so sure we need or want a recommender
query API that is separate from the Solr query API. What we are doing on our demo site is
putting the output of the Solr-recommender where Solr can index it. Our web app framework
then allows very flexible queries against Solr, using simple user history, producing the typical
user-history based recommendations, or mixing/boosting based on metadata or contextual data.
If we leave the recommender query API in Solr we get web app framework integration for free.
>
> Another point is where the data is stored for the running system. If we allow Solr to
index from any storage service that it supports then we also get free integration with most
any web app framework and storage service. For the demo site we put the data in a DB and have
Solr index it from there. We also store the user history and metadata there. This is supported
by most web app frameworks out of the box. You could go a different route and use almost any
storage system/file system/content format since Solr supports a wide variety.
>
> Given a fully flexible Solr standard query and indexing scheme all you need do is tweak
the query or data source a bit and you have an item-set recommender (shopping cart) or a contextual
recommender (for example boost recs from a category) or a pure metadata/content based recommender.
>
> If the query and storage is left to Solr+web app framework then the github version is
complete if not done. Solr still needs LLR in the more-like-this queries. Term weights to
encode strength scores would also be nice and I agree that both of these could use some work.
I would like to take a look at that version - I may have missed some 
discussion about it; would you posting a link please?
>
> BTW lest we forget this does not imply the Solr-recommender is better than Myrrix or
the Mahout-only recommenders. There needs to be some careful comparison of results. Michael,
did you do offline or A/B tests during your implementation?

I ran some offline tests using our historical data, but I don't have a 
lot of faith in these beyond the fact they indicate we didn't make any 
obvious implementation errors.  We haven't attempted A/B testing yet 
since our site is so new, and we need to get a meaningful baseline going 
and sort out a lot of other more pressing issues on the site - 
recommendations are only one piece, albeit an important one.


Actually there was an interesting idea for an article posted recently 
about the difficulty of comparing results across systems in this field: 
http://www.docear.org/2013/09/23/research-paper-recommender-system-evaluation-a-quantitative-literature-survey/

but that's no excuse not to do better.  I'll certainly share when I know 
more :)

-Mike
>
> On Oct 9, 2013, at 6:13 AM, Michael Sokolov <msokolov@safaribooksonline.com> wrote:
>
> Just to add a note of encouragement for the idea of better integration between Mahout
and Solr:
>
> On safariflow.com, we've recently converted our recommender, which computes similarity
scores w/Mahout, from storing scores and running queries w/Postgres, to doing all that in
Solr.  It's been a big improvement, both in terms of indexing speed, and more importantly,
the flexibility of the queries we can write.  I believe that having scoring built in to the
query engine is a key feature for recommendations.  More and more I am coming to believe that
recommendation should just be considered as another facet of search: as one among many variables
the system may take into account when presenting relevant information to the user.  In our
system, we still clearly separate search from recommendations, and we probably will always
do that to some extent, but I think we will start to blend the queries more so that there
will be essentially a continuum of query options including more or less "user preference"
data.
>
> I think what I'm talking about may be a bit different than what Pat is describing (in
implementation terms), since we do LLR calculations off-line in Mahout and then bulk load
them into Solr.  We took one of Ted's earlier suggestions to heart, and simply ignored the
actual numeric scores: we index the top N similar items for each item.  Later we may incorporate
numeric scores in Solr as term weights.  If people are looking for things to do :) I think
that would be a great software contribution that could spur this effort onward since it's
difficult to accomplish right now given the Solr/Lucene indexing interfaces, but is already
supported by the underlying data model and query engine.
>
>
> -Mike
>
> On 10/2/13 12:19 PM, Pat Ferrel wrote:
>> Excellent. From Ellen's description the first Music use may be an implicit preference
based recommender using synthetic  data? I'm quickly discovering how flexible Solr use is
in many of these cases.
>>
>> Here's another use you may have thought of:
>>
>> Shopping cart recommenders, as goes the intuition, are best modeled as recommending
from similar item-sets. If you store all shopping carts as your training data (play lists,
watch lists etc.) then as a user adds things to their cart you query for the most similar
past carts. Combine the results intelligently and you'll have an item set recommender. Solr
is built to do this item-set similarity. We tried to do this for a ecom site with pure Mahout
but the similarity calc in real time stymied us. We knew we'd need Solr but couldn't devote
the resources to spin it up.
>>
>> On the Con-side Solr has a lot of stuff you have to work around. It also does not
have the ideal similarity measure for many uses (cosine is ok but llr would probably be better).
You don't want stop word filtering, stemming, white space based tokenizing or n-grams. You
would like explicit weighting. A good thing about Solr is how well it integrates with virtually
any doc store independent of the indexing and query. A bit of an oval peg for a round hole.
>>
>> It looks like the similarity code is replaceable if not pluggable. Much of the rest
could be trimmed away by config or adherence to conventions I suspect. In the demo site I'm
working on I've had to adopt some slightly hacky conventions that I'll describe some day.
>>
>> On Oct 1, 2013, at 10:38 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
>>
>>
>> Pat,
>>
>> Ellen and some folks in Britain have been working with some data I produced from
synthetic music fans.
>>
>>
>> On Tue, Oct 1, 2013 at 2:22 PM, Pat Ferrel <pat@occamsmachete.com> wrote:
>> Hi Ellen,
>>
>>
>> On Oct 1, 2013, at 12:38 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
>>
>>
>> As requested,
>>
>> Pat, meet Ellen.
>>
>> Ellen, meet Pat.
>>
>>
>>
>>
>> On Tue, Oct 1, 2013 at 8:46 AM, Pat Ferrel <pat.ferrel@gmail.com> wrote:
>> Tunneling (rat-holing?) into the cross-recommender and Solr+Mahout version.
>>
>> Things to note:
>> 1) The pure Mahout XRecommenderJob needs a cross-LLR or a cross-similairty job. Currently
there is only cooccurrence for sparsification, which is far from optimal. This might take
the form of a cross RSJ with two DRMs as input. I can't commit to this but would commit to
adding it to the XRecommenderJob.
>> 2) output to Solr needs a lot of options implemented and tested. The hand-run test
should be made into some junits. I'm slowly doing this.
>> 3) the Solr query API is unimplemented unless someone else is working on that. I'm
building one in a demo site but it looks to me like a static recommender API is not going
to be all that useful and maybe a document describing how to do it with the Solr query interface
would be best, especially for a first step. The reasoning here is that it is so tempting to
mix in metadata to the recommendation query that a static API is not so obvious. For the demo
site the recommender API will be prototyped in a bunch of ways using models and controllers
in Rails. If I'm the one to do the a Java Solr-recommender query API it will be after experimenting
a bit.
>>
>> Can someone introduce me to Ellen and Tim?
>>
>> On Sep 28, 2013, at 10:59 AM, Ted Dunning <ted.dunning@gmail.com> wrote:
>>
>> The one large-ish feature that I think would find general use would be a high performance
classifier trainer.
>>
>> Flor cleanup sort of thing it would be good to fully integrate the streaming k-means
into the normal clustering commands while revamping the command line API.
>>
>> Dmitriy's recent scala work would help quite a bit before 1.0. Not sure it can make
0.9.
>>
>> For recommendations, I think that the demo system that pat started with the elaborations
by Ellen an Tim would be very good to have.
>>
>> I would be happy to collaborate with somebody on these but am not at all likely to
have time to actually do them end to end.
>>
>> Sent from my iPhone
>>
>> On Sep 28, 2013, at 12:40, Grant Ingersoll <gsingers@apache.org> wrote:
>>
>>> Moving closer to 1.0, removing cruft, etc.  Do we have any more major features
planned for 1.0?  I think we said during 0.8 that we would try to follow pretty quickly w/
another release.
>>>
>>> -Grant
>>>
>>> On Sep 28, 2013, at 12:33 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
>>>
>>>> Sounds right in principle but perhaps a bit soon.
>>>>
>>>> What would define the release?
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On Sep 27, 2013, at 7:48, Grant Ingersoll <gsingers@apache.org> wrote:
>>>>
>>>>> Anyone interested in thinking about 0.9 in the early Nov. time frame?
>>>>>
>>>>> -Grant
>>> --------------------------------------------
>>> Grant Ingersoll | @gsingers
>>> http://www.lucidworks.com
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>



Mime
View raw message