mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: Problems with Mahout's RecommenderIRStatsEvaluator
Date Sat, 16 Feb 2013 23:12:11 GMT
There are a variety of common time based effects which make time splits best in many practical
cases.  Having the training data all be from the past emulates this better than random splits.

For one thing, you can have the same user under different names in training and test.  For
another thing, in real life you get data from the past of the user under consideration. As
a third consideration, topical events can influence all users in common.  

These all mean that random training splits can have very large error in estimated performance.

Sent from my iPhone

On Feb 16, 2013, at 1:41 PM, Tevfik Aytekin <> wrote:

> What I mean is you can choose ratings randomly and try to recommend
> the ones above  the threshold
> On Sat, Feb 16, 2013 at 10:32 PM, Sean Owen <> wrote:
>> Sure, if you were predicting ratings for one movie given a set of ratings
>> for that movie and the ratings for many other movies. That isn't what the
>> recommender problem is. Here, the problem is to list N movies most likely
>> to be top-rated. The precision-recall test is, in turn, a test of top N
>> results, not a test over prediction accuracy. We aren't talking about RMSE
>> here or even any particular means of generating top N recommendations. You
>> don't even have to predict ratings to make a top N list.
>> On Sat, Feb 16, 2013 at 9:28 PM, Tevfik Aytekin <>wrote:
>>> No, rating prediction is clearly a supervised ML problem
>>> On Sat, Feb 16, 2013 at 10:15 PM, Sean Owen <> wrote:
>>>> This is a good answer for evaluation of supervised ML, but, this is
>>>> unsupervised. Choosing randomly is choosing the 'right answers' randomly,
>>>> and that's plainly problematic.
>>>> On Sat, Feb 16, 2013 at 8:53 PM, Tevfik Aytekin <
>>>>> I think, it is better to choose ratings of the test user in a random
>>>>> fashion.
>>>>> On Sat, Feb 16, 2013 at 9:37 PM, Sean Owen <> wrote:
>>>>>> Yes. But: the test sample is small. Using 40% of your data to test
>>>>>> probably quite too much.
>>>>>> My point is that it may be the least-bad thing to do. What test are
>>> you
>>>>>> proposing instead, and why is it coherent with what you're testing?

View raw message