spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiangrui Meng (JIRA)" <>
Subject [jira] [Closed] (SPARK-6711) Support parallelized online matrix factorization for Collaborative Filtering
Date Mon, 06 Apr 2015 19:46:12 GMT


Xiangrui Meng closed SPARK-6711.
    Resolution: Duplicate

> Support parallelized online matrix factorization for Collaborative Filtering 
> -----------------------------------------------------------------------------
>                 Key: SPARK-6711
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib, Streaming
>            Reporter: Chunnan Yao
>   Original Estimate: 840h
>  Remaining Estimate: 840h
> On-line Collaborative Filtering(CF) has been widely used and studied. To re-train a CF
model from scratch every time when new data comes in is very inefficient (
However, in Spark community we see few discussion about collaborative filtering on streaming
data. Given streaming k-means, streaming logistic regression, and the on-going incremental
model training of Naive Bayes Classifier (SPARK-4144), we think it is meaningful to consider
streaming Collaborative Filtering support on MLlib. 
> We have already been considering about this issue during the past week. We plan to refer
to this paper
> ( It is based on SGD instead
of ALS, which is easier to be tackled under streaming data. 
> Fortunately, the authors of this paper have implemented their algorithm as a Github Project,
based on Storm:

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message