mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Schelter <>
Subject Playing with the dataset
Date Tue, 23 Nov 2010 14:07:35 GMT

I'm currently looking into the dataset (from as 
I'm planning to write a magazine article or blogpost on howto create a 
simple music recommender with Mahout. It should be an easy-to-follow 
tutorial that encourages people to download Mahout and play a little 
with the recommender stuff.

The dataset consists of several million 
(userID,artist,numberOfPlays)-tuples, and my goal is to find the most 
similar artists and recommend new artists to users. I extracted a 20% 
sample of the data, ignored the numberOfPlays and used an 
ItembasedRecommender with LoglikelihoodSimilarity, did some random tests 
and got reasonable results.

Now I wanna go on and include the "strength" of the preference into the 
computation. What would be the best way to deal with the numberOfPlays? 
I thought about using the log of the numberOfPlays as rating value and 
applying PearsonCorrelationSimilarity as measure, would that be a viable 
way to approach this problem?


View raw message