mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peng Cheng (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAHOUT-1274) SGD-based Online SVD recommender
Date Sat, 06 Jul 2013 17:17:48 GMT
Peng Cheng created MAHOUT-1274:
----------------------------------

             Summary: SGD-based Online SVD recommender
                 Key: MAHOUT-1274
                 URL: https://issues.apache.org/jira/browse/MAHOUT-1274
             Project: Mahout
          Issue Type: New Feature
          Components: Collaborative Filtering
            Reporter: Peng Cheng
            Assignee: Sean Owen


an online SVD recommender is otherwise similar to an offline SVD recommender except that,
upon receiving one or several new recommendations, it can add them into the training dataModel
and update the result accordingly in real time.

an online SVD recommender should override setPreference(...) and removePreference(...) in
AbstractRecommender such that the factorization result is updated in O(1) time and without
retraining.

Right now the slopeOneRecommender is the only component possessing such capability.

Since SGD is intrinsically an online algorithm and its CF implementation is available in core-0.8
(See MAHOUT-1089, MAHOUT-1272), I presume it would be a good time to convert it. Such feature
could come in handy for some websites.

Implementation: Adding new users, items, or increasing rating matrix rank are just increasing
size of user and item matrices. Reducing rating matrix rank involves just one svd. The real
challenge here is that sgd is NO ONE-PASS algorithm, multiple passes are required to achieve
an acceptable optimality and even more so if hyperparameters are bad. But here are two possible
circumvents:

1. Use one-pass algorithms like averaged-SGD, not sure if it can ever work as applying stochastic
convex-opt algorithm to non-convex problem is anarchy. But it may be a long shot.

2. Run incomplete passes in each online update using ratings randomly sampled (but not uniformly
sampled) from latest dataModel. I don't know how exactly this should be done but new rating
should be sampled more frequently. Uniform sampling will results in old ratings being used
more than new ratings in total. If somebody has worked on this batch-to-online conversion
before and share his insight that would be awesome. This seems to be the most viable option,
if I get the non-uniform pseudorandom generator that maintains a cumulative uniform distribution
I want.

I found a very old ticket (MAHOUT-572) mentioning online SVD recommender but it didn't pay
off. Hopefully its not a bad idea to submit a new tickets.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message