mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <>
Subject Re: Factorizing SVD (online SVDRecommender)
Date Wed, 09 Nov 2011 00:02:41 GMT
Thank you. Maybe it should be called FactorizedRecommender instead, since
there is no real SVD?

On Sun, Nov 6, 2011 at 11:58 PM, Sebastian Schelter <> wrote:

> I think its also important to note that in the recommendation world most
> of the time something which only looks like an SVD is used.
> In most recommendation usecases, only a tiny fraction of the
> user-item-matrix is observed. Therefore "standard" algorithms for
> obtaining the SVD cannot be used (at least that's what the papers say).
> The usual approach is to decompose the rating matrix A (users x items)
> into the product of two other matrices X (users x features) and Y (items
> x features). A is approximated by XY'.
> This decomposition is obtained by selecting X and Y in a way that
> minimizes the overall squared error in regard to the observed ratings
> (plus some regularization to avoid overfitting).
> For all observed user-item pairs (u,i) the squared error of the
> prediction is obtained by multiplying the corresponding user and item
> vector from the latent feature space:
> f(X,Y) = sum_{u,i} (r_{u,i} - x_u' y_i)^2 + some regularization term
> There are several ways to find the X and Y that minimize this function.
> One approach is to use Stochastic Gradient Descent. This approach was
> made by popular by Simon Funk's famous article during the netflix prize:
> "Netflix Update: Try this at home"
> I think that's also what's implemented in our
> ExpectationMaximizationSVDFactorizer.
> Other approaches use a technique called "Alternating Least Squares"
> where you iteratively fix either X or Y and solve the equation for the
> other.
> This approach is described in: "Large-scale Parallel Collaborative
> Filtering for the Netflix Prize",
> Our sequential ALSWRFactorizer is an implementation of this as well as
> the recently refactored ParallelALSFactorizationJob which takes this
> algorithm onto hadoop.
> There is a second very interesting paper that shows how to use the ALS
> approach with implicit feedback data: "Collaborative Filtering for
> Implicit Feedback Datasets",
> I think Tamas's sequential factorizer in MAHOUT-737 is a direct
> implementation of this and I also plan to include it in
> ParallelALSFactorizationJob.
> --sebastian
> On 07.11.2011 07:10, Sean Owen wrote:
> > It's embedded in both as far as I can tell, though I don't know enough
> > about the implementation to say what the split is. Ted remarked once
> > that it's usual to split sqrt(S) across both.
> >
> > Dotting two vectors to get one rating is really just the process of
> > matrix multiplication to recover a value from the approximate
> > user-item matrix. A = U S V', and we have some truncated versions of
> > those right matrices. Uk Sk V'k gives Ak, which is some approximation
> > of the original A (the input) but with many new values filled in from
> > which to make recommendations. The Uk and V'k actually already have Sk
> > embedded. So to get one value in Ak is just a dot of a row of Uk with
> > a column of V'k, per usual matrix multiplication.
> >
> > On Sun, Nov 6, 2011 at 11:53 PM, Lance Norskog <>
> wrote:
> >> This thread is about the class SVDRecommender, which uses an externally
> >> created factorization to do recommendation.
> >>
> >> A: The Factorization classes do not extract the scaling diagonal
> matrix. Is
> >> this embedded in the left or right matrix? Or spread across both?
> >> B: Is there an explanation of why the dot product of two feature vectors
> >> creates the preference? A paper or blog post? Or a paragraph in a reply?
> >>
> >> --
> >> Lance Norskog
> >>
> >>

Lance Norskog

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message