mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Introducing randomness into my results
Date Sun, 03 Jul 2011 07:05:02 GMT
On Sat, Jul 2, 2011 at 11:34 AM, Sean Owen <srowen@gmail.com> wrote:

> Yes that's well put. My only objection is that this sounds like you're
> saying that there is a systematic problem with the ordering, so it
> will usually help to pick any different ordering than the one you
> thought was optimal. Surely you do this in an effort to better learn
> what the right-er orderings are: are top recs under-performing their
> rank-adjusted expected click-through rate or something?


No.  I don't do it to evaluate the recommender.  I do it to allow the
recommendation engine to have a wider selection of things to learn about.

For instance, if the recommendation engine recommends B if you have seen A
and there is little other way to discover C which is ranked rather low (and
thus never seen), then there is no way for the engine to even get training
data about C.  The fact is, however, that exploring the space around good
recommendations is a good thing to do.  This is referred to as the explore /
exploit trade-off in the multi-armed bandit literature.



> (Why is a different question.) That's what I mean by "for evaluation"
> rather
> than "as a way to improve recs per se" and I imagine it's just a
> difference of semantics.
>

I do it precisely to improve recommendations.



> It sounds like the problem here is preventing popular items from
> dominating recommendations just because they are generally popular --
>

Not at all.  That should be done earlier in the recommendation engine
itself.


> because being in recommendations makes them popular and you have a
> positive feedback loop.


Well, if you redefine popular to be popular in a particular context, then I
agree.



> It's a problem; it's often one that simple
> approaches to recs have since they don't naturally normalize away
> general popularity that is not specific to the current user.


Oops, I disagree again.  This problem of the recommendation drinking its own
bathwater persists even without this top-40 problem.

For example simplistic similarity metrics like simple co-occurrence would
> strongly favor popular items in a way that is clearly undesirable.
>

Yes.  But that is a different problem.


> On this note, I like the approach of systems like iTunes, which appear
> to try to give you top recs in several genres rather than one
> monolithic list of recommendations. I find this much more reasonable
> to recommend by genre; that's a better size of world from which to
> recommend.


This is related to yet a different problem that I call flooding.  The basic
problem is that we make all recommendations independently, but what we want
is the highest probability of finding the user's delight from a portfolio of
recommendations.  As such, recommending duplicates is pointless because it
reduces our 20 or so chances of success to one.  Similarly, it is not good
to recommend items that are all near duplicates.  In fact, it isn't even
good to recommend items that are recommended because of near duplicate
reasons.  The counter-measures to this are what I call anti-flooding.  These
are often heuristic in nature such as the use of equal number of
recommendations from multiple genres or limiting the number of
recommendations from the same artist.

So, if *that* is the sort of issue you're trying to solve by shaking
> up recommendations, maybe it would be more meaningful to look at
> restricting recommendations. Can you list the top 2 recs from 5
> popular categories?


Nope.  The dithering is really for getting more exploratory behavior into
the system.  Genres are for anti-flooding.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message