mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Salil Apte <sa...@offlinelabs.com>
Subject Re: Introducing randomness into my results
Date Sat, 02 Jul 2011 19:45:01 GMT
All great answers and it will certainly take a few reads of your answers for
me to digest everything you guys have said :)

Re: reasons for randomness, I'd like to introduce some serendipity into the
recommendation system. In short, even though your nearest neighbors have the
highest likelihood of "liking" the same items as you, there might be some
items that you and your nearest neighbors don't know you would like until
you see them. This is the classic "you don't know what you don't know".

I really like the idea of segmenting results to prevent random items from
polluting the results which have the highest likelihood of being good
recommendations. This is something I will explore from an algorithmic and UI
standpoint.

I'll start work on this stuff in the coming weeks and let you know how it
turns out.

Thanks for the help guys.



On Sat, Jul 2, 2011 at 11:34 AM, Sean Owen <srowen@gmail.com> wrote:

> Yes that's well put. My only objection is that this sounds like you're
> saying that there is a systematic problem with the ordering, so it
> will usually help to pick any different ordering than the one you
> thought was optimal. Surely you do this in an effort to better learn
> what the right-er orderings are: are top recs under-performing their
> rank-adjusted expected click-through rate or something? (Why is a
> different question.) That's what I mean by "for evaluation" rather
> than "as a way to improve recs per se" and I imagine it's just a
> difference of semantics.
>
>
> It sounds like the problem here is preventing popular items from
> dominating recommendations just because they are generally popular --
> because being in recommendations makes them popular and you have a
> positive feedback loop. It's a problem; it's often one that simple
> approaches to recs have since they don't naturally normalize away
> general popularity that is not specific to the current user. For
> example simplistic similarity metrics like simple co-occurrence would
> strongly favor popular items in a way that is clearly undesirable.
>
> I guess I think that anyone who thinks they have this problem,
> probably has some relatively easy-to-fix algorithmic problem and
> that's better to fix directly if so.
>
>
> On this note, I like the approach of systems like iTunes, which appear
> to try to give you top recs in several genres rather than one
> monolithic list of recommendations. I find this much more reasonable
> to recommend by genre; that's a better size of world from which to
> recommend. I find it hard to reason about whether I like a certain
> rock album better than a certain classical piece, though I can tell
> you easily which rock songs are better than other rock songs. So, a
> system that tries to combine all of these things will struggle: is the
> idea of "top 10 recommendations from across all of music" just
> problematic to begin with? (Go look at your list of all
> recommendations across all object types at Amazon -- they do a great
> job, but man, how do you meaningfully state that an electric
> toothbrush slots between an Alan Partridge DVD and Harry Potter?)
>
> So, if *that* is the sort of issue you're trying to solve by shaking
> up recommendations, maybe it would be more meaningful to look at
> restricting recommendations. Can you list the top 2 recs from 5
> popular categories? Or can you heavily bias the recommendations to
> take into account current context (current item)? Those are also ways
> to surface more "lower" items.
>
>
> Just some more ideas...
>
>
>
>
> On Sat, Jul 2, 2011 at 5:42 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
> > I agree that randomness is good to introduce, but I disagree that it is
> only
> > for evaluation purposes.  In particular, I find it useful to smooth out
> some
> > discontinuities that result from the incestuous relationship
> recommendation
> > results and later recommendations.  Without this, you can quickly wind up
> > with a very strong echo in your recommendations of the first page of your
> > previous recommendations.  By dithering result orders, you can adjust the
> > exploration / exploitation trade-off in your system.  In anecdotal
> reports,
> > I have heard and seen very dramatic improvements in recommendation
> quality
> > with dithering.
> >
> > That aside, a useful way to do the reordering is to generate a synthetic
> > score for each item based on its rank r in the original sorted result
> set.
> >
> >    double newScore = Math.exp(- r / scale) + fuzz * random.nextDouble();
> >
> > the size of fuzz determines where the reordering will occur.  If you pick
> > fuzz = exp(-2), then reordering will only occur when exp(-(r+1)/scale)
> > - exp(-r/scale) < 2*fuzz.  For instance, with scale = 1, 3 and 10 and
> fuzz =
> > 0.1, here are some sample re-orderings:
> >
> >> order(-exp(-(1:30)/3) + 0.1 * runif(30))
> >  [1]  1  2  3  4  5  6  8  7  9 13 12 26 14 28 27 24 17 10 11 20 29 19
> 16 15
> > 23 22 25 30 21 18
> >> order(-exp(-(1:30)/3) + 0.1 * runif(30))
> >  [1]  1  2  3  4  6  5 11  7  8 21 20 29 16 24 25 14 19 26  9 15 13 28 10
> 22
> > 12 17 27 18 30 23
> >> order(-exp(-(1:30)/3) + 0.1 * runif(30))
> >  [1]  1  2  3  4  6  5  7  8 12 16 26 18 24 21  9 22 28 10 11 19 29 30 15
> 20
> > 14 23 17 25 13 27
> >> order(-exp(-(1:30)/3) + 0.1 * runif(30))
> >  [1]  1  2  3  4  5  6  7  8 10 11 28 20 26 18 29 19 22 24 25  9 15 21 27
> 30
> > 13 16 14 12 17 23
> >
> >> order(-exp(-(1:30)) + 0.1 * runif(30))
> >  [1]  1  2 17 23 19 11  8 21  3 28 12 22 25 13 30  9 27 24  7  4 14 29 10
>  5
> >  6 20 26 15 16 18
> >> order(-exp(-(1:30)) + 0.1 * runif(30))
> >  [1]  1  2 15 25 20 29 14  5  7 16 10  3 18 13 17 23  8 26 22 12 21 24  4
> 30
> >  9 27 28 19 11  6
> >> order(-exp(-(1:30)) + 0.1 * runif(30))
> >  [1]  1  2 19  3  5 28 23 24 12 25  7 22 17 21  4  6 20 13 16 29  9 14 30
> 10
> > 27  8 18 15 26 11
> >
> >> order(-exp(-(1:30)/10) + 0.1 * runif(30))
> >  [1]  1  2  3  5  4  7  6  8  9 10 11 12 13 14 16 18 15 22 19 17 23 24 25
> 20
> > 21 29 30 27 28 26
> >> order(-exp(-(1:30)/10) + 0.1 * runif(30))
> >  [1]  1  2  3  4  5  6  7  8  9 10 12 11 13 16 14 18 15 20 17 21 19 26 29
> 25
> > 30 22 24 23 27 28
> >> order(-exp(-(1:30)/10) + 0.1 * runif(30))
> >  [1]  1  2  3  5  4  7  6  8  9 12 11 10 13 16 14 15 20 19 17 22 25 18 21
> 28
> > 24 23 29 26 27 30
> >
> > As you can see, with scale = 1, only the first 2 results are stable and
> very
> > deep results can be surfaced.  With scale = 10, reordering only becomes
> very
> > strong below 10-20 results.  I usually use scale = 3 because result pages
> > are usually 20 long.  With shorter result pages, scale = 2 is probably
> > justified.
> >
> >
> >
> > On Sat, Jul 2, 2011 at 12:56 AM, Sean Owen <srowen@gmail.com> wrote:
> >
> >> Yes, it's a good idea. Usually it serves a purpose for evaluation
> >> only. You know the relative strength of recommendations, and know how
> >> much ranking them 1st, 2nd, 3rd, etc biases the user to click on them.
> >> So you can predict how many clicks each should relatively get. And you
> >> can easily pull up recommendation #10 if you want to see if it gets
> >> unusually more clicks than you'd expect. This tells you there's
> >> something suboptimal about recs if so.
> >>
> >> Don't make similarity change randomly. It is supposed to have
> >> properties like symmetry and transitivity, which would then break.
> >>
> >> You can make the neighborhood pick other users, yes. But I think the
> >> most direct way to do this is to reorder the final recommendations.
> >> That works for any implementation. So I would do #2.
> >>
> >> But again I would not do this just for its own sake; on its face, it
> >> hurts recommendation quality. Do so if you are using it to evaluate
> >> quality.
> >>
> >>
> >> On Fri, Jul 1, 2011 at 7:42 PM, Salil Apte <salil@offlinelabs.com>
> wrote:
> >> > My first post to the Mahout group. First, Mahout devs, you have
> created
> >> > something great so thanks!
> >> >
> >> > I have inherited some Mahout code and am trying to make some
> >> improvements. I
> >> > was hoping to get some guidance.
> >> >
> >> > 1. We are using the NearestNUserNeighborhood class for neighborhood
> >> > calculations. While I want to use the similarity metrics provided in
> >> Mahout,
> >> > I also want to introduce some randomness. In effect, I want to include
> a
> >> few
> >> > people into the final nearest neighbors set that are not actually that
> >> > close. That way, my recommender will include some outliers into the
> >> results
> >> > which is a desirable property for our recommender. What's the best way
> of
> >> > doing this? I can of course implement my own similarity metric (which
> >> could
> >> > internally use PearsonCorrelationSimilarity) and then randomly give a
> >> high
> >> > correlation number to certain people. But is there a better way?
> >> >
> >> > 2. I also want to introduce some randomness into the final recommended
> >> set.
> >> > I am thinking I can do this by creating a custom IDRescorer and
> randomly
> >> > bumping up the score for some of the items. This will of course
> require
> >> some
> >> > tweaking (how often an item gets a bump, how much of a bump does it
> get,
> >> > etc.) but is there a better way of going about this?
> >> >
> >> > Thanks for the help!
> >> >
> >> > -Salil
> >> >
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message