mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: Introducing randomness into my results
Date Sun, 03 Jul 2011 09:43:53 GMT
On Sun, Jul 3, 2011 at 8:05 AM, Ted Dunning <ted.dunning@gmail.com> wrote:
> For instance, if the recommendation engine recommends B if you have seen A
> and there is little other way to discover C which is ranked rather low (and
> thus never seen), then there is no way for the engine to even get training
> data about C.  The fact is, however, that exploring the space around good
> recommendations is a good thing to do.  This is referred to as the explore /
> exploit trade-off in the multi-armed bandit literature.

Agree, that's a good reason to mix it up. Recommendations are a
secondary source of possible new user-item interactions (i.e. that is
not the only way to discover C), but are far more productive at
driving serendipity than just waiting for it to happen. See below... I
guess I think of randomization as the crudest way to get this effect.
Surely your "anti-flooding" is more directed and effective?

*So I would have thought to randomize the ordering mostly in the
context of gathering data to separate the effect of position from
recommendation quality, since there you do want random shuffling.)


> I do it precisely to improve recommendations.

But you are not suggesting that jiggling the order improves these
recommendations *right now*, right? that it's a possible information
gathering exercise for the future. I may have wrongly misinterpreted
the OP's question, but that is what I was narrowly responding to.


> Not at all.  That should be done earlier in the recommendation engine
> itself.

(Yes, of course. It's precisely that I agree it's a different issue,
and I am wondering out loud whether this different and more basic
issue is at play or not -- or else we're talking about solutions to
the wrong issue!)


> This is related to yet a different problem that I call flooding.  The basic
> problem is that we make all recommendations independently, but what we want
> is the highest probability of finding the user's delight from a portfolio of
> recommendations.  As such, recommending duplicates is pointless because it
> reduces our 20 or so chances of success to one.  Similarly, it is not good
> to recommend items that are all near duplicates.  In fact, it isn't even
> good to recommend items that are recommended because of near duplicate
> reasons.  The counter-measures to this are what I call anti-flooding.  These
> are often heuristic in nature such as the use of equal number of
> recommendations from multiple genres or limiting the number of
> recommendations from the same artist.

> Nope.  The dithering is really for getting more exploratory behavior into
> the system.  Genres are for anti-flooding.

I follow the distinction you're trying to draw, and agree you can
extract two reasons or purposes, but both result in the same thing,
no? I remove some "better" recommendations on the theory that, while
they will be probably be liked more, their utility is lower than some
"worse" recommendations. Do these need two approaches then?

Doing that in a directed way, your "anti-flooding", seems better than
randomizing for both purposes. Or: why would one expect that recs #11,
#12, and #13 are "exploratory" compared to #1 - #10?

Or are you just saying that a bit of shuffling is just so easy and
simple and happens to add enough value to be useful in practice? I
could believe that. I have never tried it myself!


------

Here's an idea I've also never tried, and am sure there is a proper
term of art for.

Recs are often ranked by expected rating. But those expected ratings
are often computed as a mean over some set of samples (the many
estimates implied by item-item similarity and item rating for
instance). So there's really a normal distribution for reach
recommended item about this mean rating. Instead of ranking by mean
you could sample from each of these little distributions and rank by
that sample. The resulting ranking may change according to the
sampling.

This also has an effect of mixing things up a little bit, though it's
at least driven more directly from the data: an item moves more or
less not based on its position and an RNG but based on the size of the
standard deviation of the samples that produced its estimated rating.

Mime
View raw message