mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: CityBlockSimilarity details
Date Thu, 29 Mar 2012 22:04:21 GMT
It is very common that preferences or ratings DECREASE recommendation
performance.

The basic reason is that there is little or no real signal in the ratings
after you account for the fact that the rating exists at all.

In practice, there is the additional reason that if you don't need a
rating, you can use implicit feedback which typically is 20-100x more
common than rating data.  Ratings start off not so great and then with a
huge deficit in data volume, they have no chance.

On Thu, Mar 29, 2012 at 2:52 PM, ziad kamel <ziad.kamel25@gmail.com> wrote:

> OK, things become more clear .
>
> Will the Top items selected be same when changing the similarity ? Or
> it does matter ?
>
> When using Pearson similarity that use the preference I got a
> precision of 10%  when using CityBlockSimilarity I got 50% . How come
> when we neglect the preferences I got higher precision?
>
>
>
> On Thu, Mar 29, 2012 at 4:37 PM, Sean Owen <srowen@gmail.com> wrote:
> > Ah OK. The key piece you are missing is that this similarity is assuming
> > that all vector values are 0 (not present) or 1 (present). Every
> dimension
> > either contributes 1 to the distance (one value is 0 and the other is 1)
> or
> > 0 to the distance (both are 0, or both are 1). The distance is therefore
> > the "XOR" of the data: the number of dimension in which they differ. The
> > "XOR" of set A and B is the  their union minus twice their intersection,
> > hence the formula.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message