mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Hodges <>
Subject Can someone suggest an approach for calculating precision and recall for distributed recommendations?
Date Sun, 26 Aug 2012 14:47:54 GMT

We have been tasked with producing video recommendations for our users. We
get about 100 million video views per month and track users and the videos
they watch, but currently we don’t collect rating value or preference.
Later we plan on using implicit data like percentage of video watched to
surmise preferences but for the first release we are stuck with Boolean
viewing data. To that end we started by using Mahout’s distributed
RecommenderJob with LoglikelihoodSimilarity algorithm to generate 50 video
recommendations for each user. We would like to gauge how well we are doing
by offline measuring precision and recall of these recommendations. We know
we should divide the viewing data into training and test data, but not real
sure what steps to take next. For the non-distributed approach we would
leverage IRStatistics to get the precision and recall values, but it seems
there isn’t as simple a solution within the Mahout framework for the Hadoop
based calculations.

Can someone please share/suggest their techniques for evaluating
recommendation accuracy with Mahout’s Hadoop-based distributed algorithms?

Thanks in advance,


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message