mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Clive Cox (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAHOUT-542) MapReduce implementation of ALS-WR
Date Sun, 17 Jul 2011 15:10:59 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066668#comment-13066668
] 

Clive Cox commented on MAHOUT-542:
----------------------------------

An earlier comment said:
"...if this job here should produce recommendations for all users, we cannot naively multiply
the transpose of the user features matrix with the item features matrix to estimate all possible
preferences as these are dense matrices."

Can people explain further the problems with the naive approach?

How are people generally deriving recommendations from Matrix Factorization techniques? Get
a neighnourhood from some other CF algorithm and then score only that neighnbourhood using
the derived matrices?

> MapReduce implementation of ALS-WR
> ----------------------------------
>
>                 Key: MAHOUT-542
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-542
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Collaborative Filtering
>    Affects Versions: 0.5
>            Reporter: Sebastian Schelter
>            Assignee: Sebastian Schelter
>             Fix For: 0.5
>
>         Attachments: MAHOUT-452.patch, MAHOUT-542-2.patch, MAHOUT-542-3.patch, MAHOUT-542-4.patch,
MAHOUT-542-5.patch, MAHOUT-542-6.patch, logs.zip
>
>
> As Mahout is currently lacking a distributed collaborative filtering algorithm that uses
matrix factorization, I spent some time reading through a couple of the Netflix papers and
stumbled upon the "Large-scale Parallel Collaborative Filtering for the Netflix Prize" available
at http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf.
> It describes a parallel algorithm that uses "Alternating-Least-Squares with Weighted-λ-Regularization"
to factorize the preference-matrix and gives some insights on how the authors distributed
the computation using Matlab.
> It seemed to me that this approach could also easily be parallelized using Map/Reduce,
so I sat down and created a prototype version. I'm not really sure I got the mathematical
details correct (they need some optimization anyway), but I wanna put up my prototype implementation
here per Yonik's law of patches.
> Maybe someone has the time and motivation to work a little on this with me. It would
be great if someone could validate the approach taken (I'm willing to help as the code might
not be intuitive to read) and could try to factorize some test data and give feedback then.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message