mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Parimi Rohit <rohit.par...@gmail.com>
Subject Cosine Similarity and LogLikelihood not helpful for implicit feedback!
Date Mon, 29 Sep 2014 23:24:35 GMT
Hi,

I am exploring a random-walk based algorithm for recommender systems which
works by propagating the item preferences for users on the user-user graph.
To do this, I have to compute user-user similarity and form a neighborhood.
I have tried the following three simple techniques to compute the score
between two users and find the neighborhood.

1. Score = (Common Items between users A and B) / (items preferred by A +
items Preferred by B)
2. Scoring based on Mahout's Cosine Similarity
3. Scoring based on Mahout's LogLikelihood similarity.

My understanding is that similarity based on LogLikelihood is more robust,
however, I get better results using the naive approach (technique 1 from
the above list). The problems I am addressing are collaborator
recommendation, conference recommendation and reference recommendation and
the data has implicit feedback.

So, my questions is, are there any cases where cosine similarity and
loglikelihood metrics fail (to capture similarity), for example, for the
problems stated above, users only collaborate with few other users (based
on area of interest), publish in only few conferences (again based on area
of interest) and refer to publications in a specific domain. So, the
preference counts are fairly small compared to other domains (music/video
etc).

Secondly, for CosineSimilarity, should I treat the preferences as boolean
or use the counts? (I think loglikelihood metric does not take into account
the preference counts.. correct me if I am wrong.)

Any insight into this is much appreciated.

Thanks,
Rohit

p.s. Ted, Pat: I am following the discussion on the thread
"LogLikelihoodSimilarity Calculation" and your answers helped me a lot to
understand how it works and made me wonder why things are different in my
case.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message