Mailing-List: contact user-help@mahout.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@mahout.apache.org
Received-SPF: pass (athena.apache.org: domain of ted.dunning@gmail.com
 designates 209.85.213.169 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAEccTywgV=GL1ik-L+3pxBiu6+oMNCPY+=_Rm7f3wgEG8zTHQQ@mail.gmail.com>
References: <F0157953-1514-4CD9-A758-B22D675FA31C@gmail.com>
 <CAH9ofMYXi0ZVgKHtqZOfiOOqLa-XvFQPy_oDDkqS15WBtUfWqw@mail.gmail.com>
 <CAEccTywM6+eKwBZB2kXFony4CR==OWLvR-J48+04hfNZz3OGnQ@mail.gmail.com>
 <CAJwFCa34M5jXFzbMCE-nVxe0oD9OiFvFYt=N7xG4VKqLQxzEzg@mail.gmail.com>
 <50977135-E391-41BB-9148-07E7E88B0409@gmail.com>
 <CAEccTywgV=GL1ik-L+3pxBiu6+oMNCPY+=_Rm7f3wgEG8zTHQQ@mail.gmail.com>
From: Ted Dunning <ted.dunning@gmail.com>
Date: Fri, 7 Feb 2014 01:35:28 +0100
Message-ID: 
 <CAJwFCa0rx7ew=zPvWVQsS3m0jGzkWMcfsSO8H1F2wniLDu1gGw@mail.gmail.com>
Subject: Re: Popularity of recommender items
To: "user@mahout.apache.org" <user@mahout.apache.org>
Content-Type: multipart/alternative; boundary=089e0111e0da773efb04f1c62c1d

--089e0111e0da773efb04f1c62c1d
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Rising popularity is often a better match to what people want to see on a
"most popular" page.

The best measure for that in my experience is log (new_count + offset) /
(old_count + offset) where new and old counts are the number of views
during the periods in question and offset is used partly to avoid log(0) or
x/0 problems, but also to give a Bayesian grounding to the measure.


On Thu, Feb 6, 2014 at 5:33 PM, Sean Owen <srowen@gmail.com> wrote:

> Agree - I thought by asking for most popular you meant to look for apple
> pie.
>
> Agree with you and Ted that the sum of similarity says something
> interesting even if it is not popularity exactly.
> On Feb 6, 2014 11:16 AM, "Pat Ferrel" <pat@occamsmachete.com> wrote:
>
> > The problem with the usual preference count is that big hit items can b=
e
> > overwhelmingly popular. If you want to know which ones the most people
> saw
> > and are likely to have an opinion about then this seems a good measure.
> But
> > these hugely popular items may not differentiate taste.
> >
> > So we calculate the =E2=80=9Cimportant=E2=80=9D taste indicators with L=
LR. The benefit of
> > the similarity matrix is that it attempts to model the =E2=80=9Cimporta=
nt=E2=80=9D
> > cooccurrences.
> >
> > There is an affect of hugely popular items where they really say nothin=
g
> > about similarity of taste. Everyone likes motherhood and Apple pie so i=
t
> > doesn=E2=80=99t say much about us if we both do to. This is usually acc=
ounted for
> > with something like TFIDF so I suppose another weighted popularity
> measure
> > would be to run the preference matrix through TFIDF to de-weight
> > non-differentiating preferences.
> >
> > On Feb 6, 2014, at 7:14 AM, Ted Dunning <ted.dunning@gmail.com> wrote:
> >
> > If you look at the indicator matrix (cooccurrence reduced by LLR), you
> will
> > usually have asymmetry due to limitations on the number of indicators p=
er
> > row.
> >
> > This will give you some interesting results when you look at the column
> > sums.  I wouldn't call it popularity, but it is an interesting measure.
> >
> >
> >
> > On Thu, Feb 6, 2014 at 2:15 PM, Sean Owen <srowen@gmail.com> wrote:
> >
> > > I have always defined popularity as just the number of ratings/prefs,
> > > yes. You could rank on some kind of 'net promoter score' -- good
> > > ratings minus bad ratings -- though that becomes more like 'most
> > > liked'.
> > >
> > > How do you get popularity from similarity -- similarity to what?
> > > Ranking by sum of similarities seems more like a measure of how much
> > > the item is the 'centroid' of all items. Not necessarily most popular
> > > but 'least eccentric'.
> > >
> > >
> > > On Thu, Feb 6, 2014 at 7:41 AM, Tevfik Aytekin <
> tevfik.aytekin@gmail.com
> > >
> > > wrote:
> > >> Well, I think what you are suggesting is to define popularity as bei=
ng
> > >> similar to other items. So in this way most popular items will be
> > >> those which are most similar to all other items, like the centroids =
in
> > >> K-means.
> > >>
> > >> I would first check the correlation between this definition and the
> > >> standard one (that is, the definition of popularity as having the
> > >> highest number of ratings). But my intuition is that they are
> > >> different things. For example. an item might lie at the center in th=
e
> > >> similarity space but it might not be a popular item. However, there
> > >> might still be some correlation, it would be interesting to check it=
.
> > >>
> > >> hope it helps
> > >>
> > >>
> > >>
> > >>
> > >> On Wed, Feb 5, 2014 at 3:27 AM, Pat Ferrel <pat@occamsmachete.com>
> > > wrote:
> > >>> Trying to come up with a relative measure of popularity for items i=
n
> a
> > > recommender. Something that could be used to rank items.
> > >>>
> > >>> The user - item preference matrix would be the obvious thought. Jus=
t
> > > add the number of preferences per item. Maybe transpose the preferenc=
e
> > > matrix (the temp DRM created by the recommender), then for each row
> > vector
> > > (now that a row =3D item) grab the number of non zero preferences. Th=
is
> > > corresponds to the number of preferences, and would give one measure =
of
> > > popularity. In the case where the items are not boolean you'd sum the
> > > weights.
> > >>>
> > >>> However it might be a better idea to look at the item-item similari=
ty
> > > matrix. It doesn't need to be transposed and contains the "important"
> > > similarities--as calculated by LLR for example. Here similarity means
> > > similarity in which users preferred an item. So summing the non-zero
> > > weights would give perhaps an even better relative "popularity"
> measure.
> > > For the same reason clustering the similarity matrix would yield
> > > "important" clusters.
> > >>>
> > >>> Anyone have intuition about this?
> > >>>
> > >>> I started to think about this because transposing the user-item
> matrix
> > > seems to yield a fromat that cannot be sent directly into clustering.
> > >
> >
> >
>

--089e0111e0da773efb04f1c62c1d--