mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guohua Hao <>
Subject Re: Question about implementing item-based collaborative filtering algorithms
Date Thu, 11 Feb 2010 20:35:51 GMT
Hello Sean,

First, I like your tweaks there.

Based on your example, I came up with a new extreme case, which may cause
some trouble. Suppose the user u has rated several items (e.g., 10 items)
all with rating 5, and we want to predict user u's rating for item i, P_{u,
i}. If item i 's similarities with all those already rated items are the
same, which are very close to -1, we are still going to get P_{u,i} = 5,
because those similarities factors will be canceled out. However, there is
still counter-intuitive, since we expect P_{u, i} to be very close to 1 ( in
the 1-5 rating range) with more confidence.

Shall we consider this case in the code?


On Wed, Feb 10, 2010 at 6:13 PM, Sean Owen <> wrote:

> Yes, great point. It's bad if there's only one item that the user has
> rated that has any similarity to the item being predicted. According
> to even the 'corrected' formula, the similarity value doesn't even
> matter. It cancels out. That leads to the counter-intuitive
> possibility you highlight.
> For that reason GenericItemBasedRecommender won't make a prediction in
> this situation. You could argue it's a hack but I feel it should be
> undefined in this situation.
> You could certainly throw out 3.2.1 entirely and think up something
> better, though I think with the two tweaks I've described here, its
> core logic is simple and remains sound.
> Sean
> On Thu, Feb 11, 2010 at 12:04 AM, Guohua Hao <> wrote:
> > I think you brought up a good point as to dealing with negative
> > similarities, which I have not realized before. Here is my other thought.
> > Based on your example and the proposed method, we will get a predicted
> > rating of 5 in such case after normalization. This seems
> counter-intuitive
> > to me, since we know that these two items are very dissimilar (actually
> > opposite correlated), a predicted rating close to 1 will be more
> intuitive
> > to me. Maybe we need to think more about the expression in section 3.2.1
> of
> > that paper.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message