mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAHOUT-898) Error in formula for preference estimation in GenericItemBasedRecommender
Date Mon, 28 Nov 2011 19:21:40 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158657#comment-13158657
] 

Sean Owen commented on MAHOUT-898:
----------------------------------

Yes I could imagine this improves metrics in some cases. I ran a little test and actually
saw a small RMSE decrease over the existing implementation for example. I truly don't know
whether it's overall going to help or hurt things.

I would actually phrase your suggestion differently: instead of construing a negative weight
as a vote against a value in the weighted average, it's construing it as a *positive* vote
for the *opposite* value. Here opposite means the negative of the rating. And that's the only
bit I have a problem with, conceptually. If the opposite of 4 on a scale of 5 were 2, instead
of -4, it would seem complete. (Really, should be as far below the user's mean rating as 4
is above it -- and it happens to do that automatically if the mean is already 0, yes. It won't
be 0 in general.)

I think that's a perfectly coherent strategy, one I hadn't thought of before. It is different
from what's in the literature and what's been in the code. I still hesitate to change the
simple weighted average here. At the same time I think it would be fine to incorporate this
other strategy.

We could make this pluggable with a default implementation that does what the algorithm today
does. It adds yet another hook and pluggable module to worry about, but, I don't think it's
so bad.

Am I missing anything easier? Looking for a way to balance the many issues in this thread
as best we can.

                
> Error in formula for preference estimation in GenericItemBasedRecommender
> -------------------------------------------------------------------------
>
>                 Key: MAHOUT-898
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-898
>             Project: Mahout
>          Issue Type: Bug
>          Components: Collaborative Filtering
>         Environment: mahout-core
>            Reporter: Paulo Villegas
>            Assignee: Sean Owen
>            Priority: Minor
>              Labels: patch
>             Fix For: 0.6
>
>         Attachments: GenericItemBasedRecommender.diff
>
>
> The formula to estimate the preference for an item in the Taste item-based recommender
normalizes by the sum of similarities for items used in estimation. But the terms in the sum
taken to normalize should be in absolute value, since they can be negative (e.g. when using
Pearson correlation, similarity is in [-1,1]). Now they are not, and as a result when there
are negative and positive values they cancel out, giving a small denominator and incorrectly
boosting the preference for the item (symptom: it is easy for a predicted preference to take
the maximum value, since the quotient becomes large and it is capped afterwards)
> The patch is rather trivial (a one-liner, actually) for src/main/java/org/apache/mahout/cf/taste/impl/recommender/GenericItemBasedRecommender.java
> Note: the same error & suggested fix happens in GenericUserBasedRecommender

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message