mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhongduo Lin <zhong...@gmail.com>
Subject Re: Question about evaluating a Recommender System
Date Wed, 08 May 2013 15:00:40 GMT
Thank you for the quick response.

I agree that a neighborhood size of 2 will make the predictions more 
sensible. But my concern is that a neighborhood size of 2 can only 
predict a very small proportion of preference for each users. Let's take 
a look at the previous example,  how can it predict item 4 if item 4 
happens to be chosen as in the test set? I think this is quite common in 
my case as well as for Amazon or eBay, since the rating is very sparse. 
So I just don't know how it can still be run.

User 1                rated item 1, 2, 3, 4
neighbour1 of user 1  rated item 1, 2
neighbour2 of user 1  rated item 1, 3


I wouldn't expect that the Root Mean Square error will have different 
performance than the Absolute difference, since in that case most of the 
predictions are close to 1, resulting a near zero error no matter I am 
using absolute difference or RMSE. How can I say "RMSE is worse relative 
to the variance of the data set" using Mahout? Unfortunately I got an 
error using the precision and recall evaluation method, I guess that's 
because the data are too sparse.

Best Regards,
Jimmy


On 13-05-08 10:05 AM, Sean Owen wrote:
> It may be true that the results are best with a neighborhood size of
> 2. Why is that surprising? Very similar people, by nature, rate
> similar things, which makes the things you held out of a user's test
> set likely to be found in the recommendations.
>
> The mapping you suggest is not that sensible, yes, since almost
> everything maps to 1. Not surprisingly, most of your predictions are
> near 1. That's "better" in an absolute sense, but RMSE is worse
> relative to the variance of the data set. This is not a good mapping
> -- or else, RMSE is not a very good metric, yes. So, don't do one of
> those two things.
>
> Try mean average precision for a metric that is not directly related
> to the prediction values.
>
> On Wed, May 8, 2013 at 2:45 PM, Zhongduo Lin <zhongduo@gmail.com> wrote:
>> Thank you for your reply.
>>
>> I think the evaluation process involves randomly choosing the evaluation
>> proportion. The problem is that I always get the best result when I set
>> neighbors to 2, which seems unreasonable to me. Since there should be many
>> test case that the recommender system couldn't predict at all. So why did I
>> still get a valid result? How does Mahout handle this case?
>>
>> Sorry I didn't make myself clear for the second question. Here is the
>> problem: I have a set of inferred preference ranging from 0 to 1000. But I
>> want to map it to 1 - 5. So there can be many ways for mapping. Let's take a
>> simple example, if the mapping rule is like the following:
>>          if (inferred_preference < 995) preference = 1;
>>          else preference = inferred_preference - 995.
>>
>> You can see that this is a really bad mapping algorithms, but if we run the
>> generated preference to Mahout, it is going to give me a really nice result
>> because most of the preference is 1. So is there any other metric to
>> evaluate this?
>>
>>
>> Any help will be highly appreciated.
>>
>> Best Regards,
>> Jimmy
>>
>>
>> Zhongduo Lin (Jimmy)
>> MASc candidate in ECE department
>> University of Toronto
>>
>>
>> On 2013-05-08 4:44 AM, Sean Owen wrote:
>>> It is true that a process based on user-user similarity only won't be
>>> able to recommend item 4 in this example. This is a drawback of the
>>> algorithm and not something that can be worked around. You could try
>>> not to choose this item in the test set, but then that does not quite
>>> reflect reality in the test.
>>>
>>> If you just mean that compressing the range of pref values improves
>>> RMSE in absolute terms, yes it does of course. But not in relative
>>> terms. There is nothing inherently better or worse about a small range
>>> in this example.
>>>
>>> RMSE is a fine eval metric, but you can also considered mean average
>>> precision.
>>>
>>> Sean


Mime
View raw message