flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-4613) Extend ALS to handle implicit feedback datasets
Date Thu, 29 Sep 2016 11:19:20 GMT

    [ https://issues.apache.org/jira/browse/FLINK-4613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15532501#comment-15532501

ASF GitHub Bot commented on FLINK-4613:

Github user gaborhermann commented on a diff in the pull request:

    --- Diff: flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/recommendation/ALS.scala
    @@ -581,6 +637,16 @@ object ALS {
             val userXy = new ArrayBuffer[Array[Double]]()
             val numRatings = new ArrayBuffer[Int]()
    +        var precomputedXtX: Array[Double] = null
    +        override def open(config: Configuration): Unit = {
    +          // retrieve broadcasted precomputed XtX if using implicit feedback
    +          if (implicitPrefs) {
    +            precomputedXtX = getRuntimeContext.getBroadcastVariable[Array[Double]]("XtX")
    +              .iterator().next()
    +          }
    +        }
             override def coGroup(left: lang.Iterable[(Int, Int, Array[Array[Double]])],
    --- End diff --
    If I see it right, I did not change this line, it was in the original ALS implementation.
However, I can't find any reason to use the Java `Iterable`.
    There could be other minor things to refactor in the original ALS code, but I preferred
to keep them as they were, not to break anything. Should I refactor some parts along the way
when I extend an algorithm like this?

> Extend ALS to handle implicit feedback datasets
> -----------------------------------------------
>                 Key: FLINK-4613
>                 URL: https://issues.apache.org/jira/browse/FLINK-4613
>             Project: Flink
>          Issue Type: New Feature
>          Components: Machine Learning Library
>            Reporter: Gábor Hermann
>            Assignee: Gábor Hermann
> The Alternating Least Squares implementation should be extended to handle _implicit feedback_
datasets. These datasets do not contain explicit ratings by users, they are rather built by
collecting user behavior (e.g. user listened to artist X for Y minutes), and they require
a slightly different optimization objective. See details by [Hu et al|http://dx.doi.org/10.1109/ICDM.2008.22].
> We do not need to modify much in the original ALS algorithm. See [Spark ALS implementation|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala],
which could be a basis for this extension. Only the updating factor part is modified, and
most of the changes are in the local parts of the algorithm (i.e. UDFs). In fact, the only
modification that is not local, is precomputing a matrix product Y^T * Y and broadcasting
it to all the nodes, which we can do with broadcast DataSets. 

This message was sent by Atlassian JIRA

View raw message