mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manuel Blechschmidt <Manuel.Blechschm...@gmx.de>
Subject Evaluation of different recommendation algorithms for 12.000 user data set
Date Mon, 21 Nov 2011 11:06:54 GMT
Hello Mahout Team, hello users,
me and a friend are currently evaluating recommendation techniques for personalizing a newsletter
for a company selling tea, spices and some other products. Mahout is such a great product
which saves me hours of time and millions of money because I want to give something back I
write this small case study to the mailing list.

I am conducting an offline testing of which recommender is the most accurate one. Further
I am interested in run time behavior like memory consumption and runtime.

The data contains implicit feedback. The preferences of the user is the amount in gramm that
he bought from a certain product (453 g ~ 1 pound). If a certain product does not have this
data it is replaced with 50. So basically I want mahout to predict how much of a certain product
is a user buying next. This is also helpful for demand planing. I am currently not using any
time data because I did not find a recommender which is using this data.

Users: 12858
Items: 5467
121304 preferences
MaxPreference: 85850.0 (Meaning that there is someone who ordered 85 kg of a certain tea or
spice)
MinPreference: 50.0

Here are the pure benchmarks for accuracy in RMSE. They change during every run of the evaluation
(~15%):

Evaluation of randomBased (baseline): 43045.380570443434 (RandomRecommender(model)) (Time:
~0.3 s) (Memory: 16MB)
Evaluation of ItemBased with Pearson Correlation: 315.5804958647985 (GenericItemBasedRecommender(model,
PearsonCorrelationSimilarity(model)) (Time: ~1s)  (Memory: 35MB)
Evaluation of ItemBase with uncentered Cosine: 198.25393235323375 (GenericItemBasedRecommender(model,
UncenteredCosineSimilarity(model))) (Time: ~1s)  (Memory: 32MB)
Evaluation of ItemBase with log likelihood: 176.45243607278724 (GenericItemBasedRecommender(model,
LogLikelihoodSimilarity(model)))  (Time: ~5s)  (Memory: 42MB)
Evaluation of UserBased 3 with Pearson Correlation: 1378.1188069379868 (GenericUserBasedRecommender(model,
NearestNUserNeighborhood(3, PearsonCorrelationSimilarity(model), model), PearsonCorrelationSimilarity(model)))
 (Time: ~52s) (Memory: 57MB) 
Evaluation of UserBased 20 with Pearson Correlation: 1144.1905989614288 (GenericUserBasedRecommender(model,
NearestNUserNeighborhood(20, PearsonCorrelationSimilarity(model), model), PearsonCorrelationSimilarity(model)))
 (Time: ~51s) (Memory: 57MB)
Evaluation of SlopeOne: 464.8989330869532 (SlopeOneRecommender(model)) (Time: ~4s) (Memory:
604MB)
Evaluation of SVD based: 326.1050823499026 (ALSWRFactorizer(model, 100, 0.3, 5)) (Time: )
(Memory: 691MB)

These were measured with the following method:

RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
double evaluation = evaluator.evaluate(randomBased, null, myModel,
	0.9, 1.0);

Memory usage was about 50m with the item based case. Slope One and SVD base seams to use the
most memory (615MB & 691MB).

The performance differs a lot. The fastest ones where the item based. They took about 1 to
5 seconds (PearsonCorrelationSimilarity and UncenteredCosineSimilarity 1 s, LogLikelihoodSimilarity
5s)
The user based where a lot slower.

Conclusion is that in my case the item based approach is the fastest, lowest memory consumption
and most accurate one. Further I can use the recommendedBecause function.

Here is the spec of the computer:
2.3GHz Intel Core i5 (4 Cores). 1024 MB for java virtual machine. 

In the next step, probably in the next 2 month. I have to design a newsletter and send it
to the customers. Then I can benchmark the user acceptance rate of the recommendations.

Any suggestions for enhancements are appreciated. If anybody is interested in the dataset
or the evaluation code send me a private email. I might be able to convince the company to
give out the dataset if the person is doing some interesting research.

/Manuel
-- 
Manuel Blechschmidt
Dortustr. 57
14467 Potsdam
Mobil: 0173/6322621
Twitter: http://twitter.com/Manuel_B


Mime
View raw message