flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gábor Hermann (JIRA) <j...@apache.org>
Subject [jira] [Created] (FLINK-4613) Extend ALS to handle implicit feedback datasets
Date Mon, 12 Sep 2016 07:50:22 GMT
Gábor Hermann created FLINK-4613:
------------------------------------

             Summary: Extend ALS to handle implicit feedback datasets
                 Key: FLINK-4613
                 URL: https://issues.apache.org/jira/browse/FLINK-4613
             Project: Flink
          Issue Type: New Feature
          Components: Machine Learning Library
            Reporter: Gábor Hermann
            Assignee: Gábor Hermann


The Alternating Least Squares implementation should be extended to handle _implicit feedback_
datasets. These datasets do not contain explicit ratings by users, they are rather built by
collecting user behavior (e.g. user listened to artist X for Y minutes), and they require
a slightly different optimization objective. See details by [Hu et al|http://dx.doi.org/10.1109/ICDM.2008.22].

We do not need to modify much in the original ALS algorithm. See [Spark ALS implementation|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala],
which could be a basis for this extension. Only the updating factor part is modified, and
most of the changes are in the local parts of the algorithm (i.e. UDFs). In fact, the only
modification that is not local, is precomputing a matrix product Y^T * Y and broadcasting
it to all the nodes, which we can do with broadcast DataSets. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message