# mahout-user mailing list archives

##### Site index · List index
Message view
Top
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Introducing randomness into my results
Date Sat, 02 Jul 2011 16:42:57 GMT
```I agree that randomness is good to introduce, but I disagree that it is only
for evaluation purposes.  In particular, I find it useful to smooth out some
discontinuities that result from the incestuous relationship recommendation
results and later recommendations.  Without this, you can quickly wind up
with a very strong echo in your recommendations of the first page of your
previous recommendations.  By dithering result orders, you can adjust the
I have heard and seen very dramatic improvements in recommendation quality
with dithering.

That aside, a useful way to do the reordering is to generate a synthetic
score for each item based on its rank r in the original sorted result set.

double newScore = Math.exp(- r / scale) + fuzz * random.nextDouble();

the size of fuzz determines where the reordering will occur.  If you pick
fuzz = exp(-2), then reordering will only occur when exp(-(r+1)/scale)
- exp(-r/scale) < 2*fuzz.  For instance, with scale = 1, 3 and 10 and fuzz =
0.1, here are some sample re-orderings:

> order(-exp(-(1:30)/3) + 0.1 * runif(30))
[1]  1  2  3  4  5  6  8  7  9 13 12 26 14 28 27 24 17 10 11 20 29 19 16 15
23 22 25 30 21 18
> order(-exp(-(1:30)/3) + 0.1 * runif(30))
[1]  1  2  3  4  6  5 11  7  8 21 20 29 16 24 25 14 19 26  9 15 13 28 10 22
12 17 27 18 30 23
> order(-exp(-(1:30)/3) + 0.1 * runif(30))
[1]  1  2  3  4  6  5  7  8 12 16 26 18 24 21  9 22 28 10 11 19 29 30 15 20
14 23 17 25 13 27
> order(-exp(-(1:30)/3) + 0.1 * runif(30))
[1]  1  2  3  4  5  6  7  8 10 11 28 20 26 18 29 19 22 24 25  9 15 21 27 30
13 16 14 12 17 23

> order(-exp(-(1:30)) + 0.1 * runif(30))
[1]  1  2 17 23 19 11  8 21  3 28 12 22 25 13 30  9 27 24  7  4 14 29 10  5
6 20 26 15 16 18
> order(-exp(-(1:30)) + 0.1 * runif(30))
[1]  1  2 15 25 20 29 14  5  7 16 10  3 18 13 17 23  8 26 22 12 21 24  4 30
9 27 28 19 11  6
> order(-exp(-(1:30)) + 0.1 * runif(30))
[1]  1  2 19  3  5 28 23 24 12 25  7 22 17 21  4  6 20 13 16 29  9 14 30 10
27  8 18 15 26 11

> order(-exp(-(1:30)/10) + 0.1 * runif(30))
[1]  1  2  3  5  4  7  6  8  9 10 11 12 13 14 16 18 15 22 19 17 23 24 25 20
21 29 30 27 28 26
> order(-exp(-(1:30)/10) + 0.1 * runif(30))
[1]  1  2  3  4  5  6  7  8  9 10 12 11 13 16 14 18 15 20 17 21 19 26 29 25
30 22 24 23 27 28
> order(-exp(-(1:30)/10) + 0.1 * runif(30))
[1]  1  2  3  5  4  7  6  8  9 12 11 10 13 16 14 15 20 19 17 22 25 18 21 28
24 23 29 26 27 30

As you can see, with scale = 1, only the first 2 results are stable and very
deep results can be surfaced.  With scale = 10, reordering only becomes very
strong below 10-20 results.  I usually use scale = 3 because result pages
are usually 20 long.  With shorter result pages, scale = 2 is probably
justified.

On Sat, Jul 2, 2011 at 12:56 AM, Sean Owen <srowen@gmail.com> wrote:

> Yes, it's a good idea. Usually it serves a purpose for evaluation
> only. You know the relative strength of recommendations, and know how
> much ranking them 1st, 2nd, 3rd, etc biases the user to click on them.
> So you can predict how many clicks each should relatively get. And you
> can easily pull up recommendation #10 if you want to see if it gets
> unusually more clicks than you'd expect. This tells you there's
> something suboptimal about recs if so.
>
> Don't make similarity change randomly. It is supposed to have
> properties like symmetry and transitivity, which would then break.
>
> You can make the neighborhood pick other users, yes. But I think the
> most direct way to do this is to reorder the final recommendations.
> That works for any implementation. So I would do #2.
>
> But again I would not do this just for its own sake; on its face, it
> hurts recommendation quality. Do so if you are using it to evaluate
> quality.
>
>
> On Fri, Jul 1, 2011 at 7:42 PM, Salil Apte <salil@offlinelabs.com> wrote:
> > My first post to the Mahout group. First, Mahout devs, you have created
> > something great so thanks!
> >
> > I have inherited some Mahout code and am trying to make some
> improvements. I
> > was hoping to get some guidance.
> >
> > 1. We are using the NearestNUserNeighborhood class for neighborhood
> > calculations. While I want to use the similarity metrics provided in
> Mahout,
> > I also want to introduce some randomness. In effect, I want to include a
> few
> > people into the final nearest neighbors set that are not actually that
> > close. That way, my recommender will include some outliers into the
> results
> > which is a desirable property for our recommender. What's the best way of
> > doing this? I can of course implement my own similarity metric (which
> could
> > internally use PearsonCorrelationSimilarity) and then randomly give a
> high
> > correlation number to certain people. But is there a better way?
> >
> > 2. I also want to introduce some randomness into the final recommended
> set.
> > I am thinking I can do this by creating a custom IDRescorer and randomly
> > bumping up the score for some of the items. This will of course require
> some
> > tweaking (how often an item gets a bump, how much of a bump does it get,