I was kind of also dubious about that name. Those algorithms are
factorizations alright but they don't technically do exactly what
mathematical SVD definition does (even with understanding that in practice
mathematically strict svd contract may be relaxed). Even that some
published algorithms try to derive their names from SVD to emohasize the
strong connection (SVD++ comes to mind) they are still very cautious to
emphasize they are not producing actual mathematical SVD.
On Nov 8, 2011 4:03 PM, "Lance Norskog" <goksron@gmail.com> wrote:
> Thank you. Maybe it should be called FactorizedRecommender instead, since
> there is no real SVD?
>
> On Sun, Nov 6, 2011 at 11:58 PM, Sebastian Schelter <ssc@apache.org>
> wrote:
>
> > I think its also important to note that in the recommendation world most
> > of the time something which only looks like an SVD is used.
> >
> > In most recommendation usecases, only a tiny fraction of the
> > useritemmatrix is observed. Therefore "standard" algorithms for
> > obtaining the SVD cannot be used (at least that's what the papers say).
> >
> > The usual approach is to decompose the rating matrix A (users x items)
> > into the product of two other matrices X (users x features) and Y (items
> > x features). A is approximated by XY'.
> >
> > This decomposition is obtained by selecting X and Y in a way that
> > minimizes the overall squared error in regard to the observed ratings
> > (plus some regularization to avoid overfitting).
> >
> > For all observed useritem pairs (u,i) the squared error of the
> > prediction is obtained by multiplying the corresponding user and item
> > vector from the latent feature space:
> >
> > f(X,Y) = sum_{u,i} (r_{u,i}  x_u' y_i)^2 + some regularization term
> >
> > There are several ways to find the X and Y that minimize this function.
> >
> > One approach is to use Stochastic Gradient Descent. This approach was
> > made by popular by Simon Funk's famous article during the netflix prize:
> >
> > "Netflix Update: Try this at home"
> > http://sifter.org/~simon/journal/20061211.html
> >
> > I think that's also what's implemented in our
> > ExpectationMaximizationSVDFactorizer.
> >
> > Other approaches use a technique called "Alternating Least Squares"
> > where you iteratively fix either X or Y and solve the equation for the
> > other.
> >
> > This approach is described in: "Largescale Parallel Collaborative
> > Filtering for the Netﬂix Prize",
> >
> >
> http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf
> >
> > Our sequential ALSWRFactorizer is an implementation of this as well as
> > the recently refactored ParallelALSFactorizationJob which takes this
> > algorithm onto hadoop.
> >
> > There is a second very interesting paper that shows how to use the ALS
> > approach with implicit feedback data: "Collaborative Filtering for
> > Implicit Feedback Datasets", http://research.yahoo.com/pub/2433
> >
> > I think Tamas's sequential factorizer in MAHOUT737 is a direct
> > implementation of this and I also plan to include it in
> > ParallelALSFactorizationJob.
> >
> > sebastian
> >
> >
> > On 07.11.2011 07:10, Sean Owen wrote:
> > > It's embedded in both as far as I can tell, though I don't know enough
> > > about the implementation to say what the split is. Ted remarked once
> > > that it's usual to split sqrt(S) across both.
> > >
> > > Dotting two vectors to get one rating is really just the process of
> > > matrix multiplication to recover a value from the approximate
> > > useritem matrix. A = U S V', and we have some truncated versions of
> > > those right matrices. Uk Sk V'k gives Ak, which is some approximation
> > > of the original A (the input) but with many new values filled in from
> > > which to make recommendations. The Uk and V'k actually already have Sk
> > > embedded. So to get one value in Ak is just a dot of a row of Uk with
> > > a column of V'k, per usual matrix multiplication.
> > >
> > > On Sun, Nov 6, 2011 at 11:53 PM, Lance Norskog <goksron@gmail.com>
> > wrote:
> > >> This thread is about the class SVDRecommender, which uses an
> externally
> > >> created factorization to do recommendation.
> > >>
> > >> A: The Factorization classes do not extract the scaling diagonal
> > matrix. Is
> > >> this embedded in the left or right matrix? Or spread across both?
> > >> B: Is there an explanation of why the dot product of two feature
> vectors
> > >> creates the preference? A paper or blog post? Or a paragraph in a
> reply?
> > >>
> > >> 
> > >> Lance Norskog
> > >> goksron@gmail.com
> > >>
> >
> >
>
>
> 
> Lance Norskog
> goksron@gmail.com
>
