mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Recommend output: User vs. Item, Tanimoto vs. LogLikelihood
Date Sat, 23 Apr 2011 20:43:46 GMT
There is another meaning of user-based and item-based that you may see in
the literature.

If A is the binary user x item interaction matrix and if h is your current
user's history (i.e. a binary vector indexed by item) , then (A h) is a
vector indexed by users that contains the number of items each user has in
common with the history vector h.  We can take this vector of users and get
back to items by setting

      r = A' ( A h )

where A' is the transpose of the original interaction matrix so that it is
items x users.  The result here is a vector of items that the users in (Ah)
interacted with.  Pretty much all versions of recommendation that look at
similar users do something like this computation, but there may be
differences in the way that items are weighted in h, or how (Ah) is
truncated or how the users' histories are combined.  Log likelihood ratio
based (LLR) recommenders are exactly like this except that they ignore some
values based on statistical measures (sparsify) and SVD recommenders use
matrix decompositions to approximate A and thus smooth out the results.

Now, in this schematic computation, it is easy to see that the final
recommendations could be computed by doing A' (Ah) or (A' A) h.  The second
form allows a large computation to be done off-line and in batch form which
can gain huge efficiencies.  This off-line form is called item-based
recommendation in the literature.  If you are doing either SVD
recommendation or LLR recommendation then these forms will be mathematically
identical but computationally will spend effort at different times.

On Sat, Apr 23, 2011 at 1:34 AM, Sean Owen <srowen@gmail.com> wrote:

> No it is definitely not true that you'll get the same result from a
> user-based and item-based recommender, even with the same similarity
> metric.
> They're different algorithms, but are actually purposely asymmetric too to
> take advantage of the difference, in practice, between what a user is and
> what an item is.
>
> Tanimoto and LL are different, yes. I suppose it's possible you will get
> the
> same recommendations with both, especially on a small toy data set. But no
> it's not true that they would always give the same result.
>
> On Sat, Apr 23, 2011 at 3:41 AM, Otis Gospodnetic <
> otis_gospodnetic@yahoo.com> wrote:
>
> > Hi,
> >
> > Given the same input data, should the same list of recommended items be
> > returned
> > regardless of whether one uses Item-based or User-based recommendations?
>  I
> > always thought the answer was yes (same "matrix" just flipped differently
> > is how
> > I imagined it), but I recently saw output of some Mahout-based
> recommender
> > that
> > returned two different lists of recommendations based on whether
> User-based
> > of
> > Item-based approach was used.  Either the code was buggy or I was wrong.
> :)
> >
> > And while I'm at it, I assume that using Tanimoto vs. LogLikelihood will
> > yield
> > different recommendations, right?  Again, I'm asking because I saw some
> > Mahout-based recommender recently that used Item-based approach and
> > returned
> > identical lists for both Tanimoto and LogLikelihood.
> >
> > Let:
> > UB stand for User-based
> > IB stand for Item-based
> > TC stand for TanimotoCoefficient
> > LL stand for LogLikelihood
> >
> > And:
> > R1 = UB with TC
> > R2 = UB with LL
> > R3 = IT with TC
> > R4 = IT with LL
> >
> > Then:
> > R1 != R2      <== ?
> > R3 != R4      <== ?
> >
> > And:
> > R1 == R3      <== ?
> > R2 == R4      <== ?
> >
> > Thanks,
> > Otis
> > --
> > We're hiring Mahout+HBase hackers for Data Mining and Analytics
> >
> >
> http://blog.sematext.com/2011/04/18/hiring-data-mining-analytics-machine-learning-hackers/
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message