mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Richard Simon Just (JIRA)" <j...@apache.org>
Subject [jira] Created: (MAHOUT-371) [GSoC] Proposal to implement Distributed SVD++ Recommender using Hadoop
Date Thu, 08 Apr 2010 07:25:36 GMT
[GSoC] Proposal to implement Distributed SVD++ Recommender using Hadoop
-----------------------------------------------------------------------

                 Key: MAHOUT-371
                 URL: https://issues.apache.org/jira/browse/MAHOUT-371
             Project: Mahout
          Issue Type: New Feature
          Components: Collaborative Filtering
            Reporter: Richard Simon Just



*****Basic Proposal - just to let you know what I have in mind. Will add more detail as to
actual implementation and some background information about myself later today*****


Title: Proposal to implement Distributed SVD++ Recommender using Hadoop [adresses MAHOUT-329]

Student: Richard Simon Just 

Basic Proposal: 

During the Netflix Prize Challenge one of the most popular forms of Recommender algorithm
was that of Matrix Factorisation, in particular Singular Value Decomposition (SVD). As such
this proposal looks to implement a distributed version of one of the most successful SVD-based
recommender algorithms from the Netflix competition. Namely, the SVD++ algorithm. 

The SVD++ improves upon other basic SVD algorithms by incorporating implicit feedback[1].
That is to say that it is able to take into account not just explicit user preferences, but
also feedback such as, in the case of a company like Netflix, whether a movie has been rented.
Implicit feedback assumes that the fact of there being some correlation between the user and
the item is more important that whether the correlation is positive or negative. Implicit
feedback would account for an item has being rated, but not what the rating was.

The implementation will include testing, in-depth documentation and a demo/tutorial. If there
is time, I will also look to developing the algorithm into the timeSVD++ algorithm[2]. The
timeSVD++ further improves the results of the SVD algorithm by taking into account temporal
dynamics. Temporal dynamics addresses the way user preferences in items and their behaviour
in how they rate items can change over time. According to [2] the gains in accuracy implementing
timeSVD++ are significantly bigger than the gains going from SVD to SVD++. 

The overall project will provide three deliverables:
     1. The basic framework for distributed SVD-based recommender
     2. A distributed SVD++ implementation
     3. A distributed timeSVD++ 



Timeline:


The Warm Up/Bonding Period (<=May 23rd):
- familiarise myself further with Mahout and Hadoop's code base and documentation
- discuss with community the proposal, design and implementation as well as related code tests,
optimisations and documentation they would like to see incorporated into the project
- build a more detailed design of algorithm implementation and tweak timeline based on feedback
- familiarise myself more with unit testing
- finish building 3-4 node Hadoop cluster and play with all the examples

Week 1 (May 24th-30th):
- start writing the back bone of the code in the form of comments and skeleton code
- implement SVDppRecommenderJob
- start to integrate DistributedLanzcosSolver

Week 2(May 31st - June 6th):
- complete DistributedLanzcosSolver integration
- start implementing distributed training, prediction and regularisation

Week 3 - 5(June 7th - 27th):
- complete implementation of distributed training, prediction and regularisation
- work on testing, cleaning up code, and tying up any loose documentation ends
- work on any documentation, tests and optimisation requested by community
- Deliverable : basic framework for distributed SVD-based recommender

Week 6 - 7(June 28th-July 11th):
- start implementation of SVD++ (keeping documentation and tests up-to-date)
- prepare demo

Week 8(July 12th - 18th): Mid-Term Report by the 16th
- complete SVD++ and iron out bugs
- implement and document demo
- write wiki articles and tutorial for what has been implemented including the demo

Week 9(July 19th - 25th):
- work on any documentation, tests and optimisation requested by community during project
- work on testing, cleaning up code, and tying up any loose documentation ends
- Deliverable : Distributed SVD++ Recommender (including Demo)

Week 10 - 11(July 26th - Aug 8th):
- incorporate temporal dynamics
- write temporal dynamics documentation, including wiki article

Week 12(Aug 9th - 15th):Suggested Pencils Down
- last optimisation and tidy up of code, documentation, tests and demo
- discuss with community what comes next, consider what JIRA issues to contribute to 
- Deliverable: Distributed SVD++ Recommender with temporal dynamics

Final Evaluations Hand-in: Aug 16th-20th. 


References:

[1] - Y. Koren, "Factorization Meets the Neighborhood: a Mulitfaceted Collaborative Filtering
Model", ACM Press, 2008, http://public.research.att.com/~volinsky/netflix/kdd08koren.pdf
[2] - Y. Koren, "Collaborative Filtering with temporal Dynamics", ACM Press, 2009, http://research.yahoo.com/files/kdd-fp074-koren.pdf


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message