mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Quach <danqu...@cs.ucla.edu>
Subject Re: How does SVDRecommender work in mahout?
Date Wed, 02 May 2012 04:50:03 GMT
I ran the factorizer on grouplens's 1 million rating movie dataset. I ran it for 5 iterations
and chose number of features to be 10.
I then constructed an SVDRecommender with the factorization, and generated all preference
estimates for every user/movie pair.

For some reason, a good number of the user's end up with predictions of "0.0" for every movie,
it seems to happen for every user greater than 2700-ish.
Is it perhaps a problem due to factorization? I will see if I can reproduce the output, this
seems like a bug and not expected behavior.

On related note, is there a way to compute the full factorization, save the output, then later
retrieve some rank-K approximation? It takes hours to run the factorizer and I feel it might
be helpful to save factorizations for reuse.

----- Original Message -----
From: "Sebastian Schelter" <ssc@apache.org>
To: user@mahout.apache.org
Sent: Sunday, April 29, 2012 11:31:34 PM
Subject: Re: How does SVDRecommender work in mahout?

Daniel,

You have to distinguish between explicit data (ratings from a
predefined scale) and implicit data (counting how often you observed
some behavior).

For explicit data, you can't interpret missing values as zeros,
because you simply don't know what the user would give as rating. In
order to still use matrix factorization techniques, the decomposition
has to be computed in a different way than with standard SVD
approaches. The error function stays the same as with SVD (minimize
the squared error of the product of the decomposed matrix), but the
computation uses only the known entries. That's nothing Mahout
specific, Mahout has implementations of the approaches described in
http://sifter.org/~simon/journal/20061211.html and in
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.173.2797&rep=rep1&type=pdf

For implicit data, the situation is different, because if you haven't
observed a user conducting some behavior with an item, than your
matrix should indeed have a 0 in that cell. The problem here is that
the user might simply not have had the opportunity to interact with a
lot of items, which means that you can't really 'trust' the zero
entries as much as the other entries. There is a great paper that
introduces a 'confidence' value for implicit data to solve this
problem: www2.research.att.com/~yifanhu/PUB/cf.pdf Generally speaking,
with this technique, the factorization uses the whole matrix, but
'favors' non-zero entries.

--sebastian

2012/4/29 Sean Owen <srowen@gmail.com>:
> They're implicitly zero as far as the math goes IIRC
>
> On Sun, Apr 29, 2012 at 10:45 PM, Daniel Quach <danquach@cs.ucla.edu> wrote:
>> ah sorry, I meant in the context of the SVDRecommender.
>>
>> Your earlier email mentioned that the DataModel does NOT do any subtraction, nor
add back in the end, ensuring the matrix remains sparse. Does that mean it inserts zero values?

Mime
View raw message