mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Cosine Similarity and LogLikelihood not helpful for implicit feedback!
Date Tue, 30 Sep 2014 05:08:18 GMT
How are you using LLR to compute user similarity?  It is normally used to
compute item similarity?

Also, what is your scale?  how many users? how many items?  how many
actions per user?



On Mon, Sep 29, 2014 at 6:24 PM, Parimi Rohit <rohit.parimi@gmail.com>
wrote:

> Hi,
>
> I am exploring a random-walk based algorithm for recommender systems which
> works by propagating the item preferences for users on the user-user graph.
> To do this, I have to compute user-user similarity and form a neighborhood.
> I have tried the following three simple techniques to compute the score
> between two users and find the neighborhood.
>
> 1. Score = (Common Items between users A and B) / (items preferred by A +
> items Preferred by B)
> 2. Scoring based on Mahout's Cosine Similarity
> 3. Scoring based on Mahout's LogLikelihood similarity.
>
> My understanding is that similarity based on LogLikelihood is more robust,
> however, I get better results using the naive approach (technique 1 from
> the above list). The problems I am addressing are collaborator
> recommendation, conference recommendation and reference recommendation and
> the data has implicit feedback.
>
> So, my questions is, are there any cases where cosine similarity and
> loglikelihood metrics fail (to capture similarity), for example, for the
> problems stated above, users only collaborate with few other users (based
> on area of interest), publish in only few conferences (again based on area
> of interest) and refer to publications in a specific domain. So, the
> preference counts are fairly small compared to other domains (music/video
> etc).
>
> Secondly, for CosineSimilarity, should I treat the preferences as boolean
> or use the counts? (I think loglikelihood metric does not take into account
> the preference counts.. correct me if I am wrong.)
>
> Any insight into this is much appreciated.
>
> Thanks,
> Rohit
>
> p.s. Ted, Pat: I am following the discussion on the thread
> "LogLikelihoodSimilarity Calculation" and your answers helped me a lot to
> understand how it works and made me wonder why things are different in my
> case.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message