mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: RowSimilarityJob with sparse matrix skips rows
Date Mon, 04 Aug 2014 06:22:07 GMT
On Fri, Jul 25, 2014 at 1:50 PM, Edith Au <edith.au@gmail.com> wrote:

> If
> user1 and user2 both like just one item, my instinct said that the
> similarity strength between the two should be hi, regardless of the size of
> the universe.
>

Actually, this isn't so much true.

Take for example a music web site which plays recommended music to you as a
radio sort of function.  When your first start playing, the site play a
small promo clip describing the service.

In such a case, two users who both close the page after this promo will
share this item.  Indeed, all users of the site will share this item.  If
you change the promo, then only users who have started streams after the
change will share the new item and only users who started streams before
the change will share the old promo.

But this says absolutely nothing important about the users themselves
(consider the case of the promo that never changes for simplicity).
 Absolutely all users will share this item.

So clearly, this is an extreme counter-example to the idea that sharing a
single item indicates similarity.  The underlying frequencies matter a lot
here.  Even for items which are not absolutely universal, sharing a common
item is of little importance.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message