flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-4712) Implementing ranking predictions for ALS
Date Fri, 25 Nov 2016 16:36:58 GMT

    [ https://issues.apache.org/jira/browse/FLINK-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15696250#comment-15696250

ASF GitHub Bot commented on FLINK-4712:

Github user thvasilo commented on the issue:

    > The problem is not with the evaluate(test: TestType): DataSet[Double] but rather
with evaluate(test: TestType): DataSet[(Prediction,Prediction)].
    Completely agree there, I advocated for removing/renaming the evaluate function, we considered
using a `score` function for a more sklearn-like approach before, see e.g. #902. Having _some_
function that returns a `DataSet[(truth: Prediction,pred: Prediction)]` is useful and probably
necessary, but we should look at alternatives as the current state is confusing.
    I think I like the approach you are suggesting, so feel free to come up with an alternative
in the WIP PRs.
    Getting rid of the Pipeline requirements for recommendation algorithms would simplify
some things. In that case we'll have to re-evaluate if it makes sense for them to implement
the `Predictor` interface at all, or maybe we have `ChainablePredictors` but I think our hierarchy
is deep enough already.

> Implementing ranking predictions for ALS
> ----------------------------------------
>                 Key: FLINK-4712
>                 URL: https://issues.apache.org/jira/browse/FLINK-4712
>             Project: Flink
>          Issue Type: New Feature
>          Components: Machine Learning Library
>            Reporter: Domokos Miklós Kelen
>            Assignee: Gábor Hermann
> We started working on implementing ranking predictions for recommender systems. Ranking
prediction means that beside predicting scores for user-item pairs, the recommender system
is able to recommend a top K list for the users.
> Details:
> In practice, this would mean finding the K items for a particular user with the highest
predicted rating. It should be possible also to specify whether to exclude the already seen
items from a particular user's toplist. (See for example the 'exclude_known' setting of [Graphlab
Create's ranking factorization recommender|https://turi.com/products/create/docs/generated/graphlab.recommender.ranking_factorization_recommender.RankingFactorizationRecommender.recommend.html#graphlab.recommender.ranking_factorization_recommender.RankingFactorizationRecommender.recommend]
> The output of the topK recommendation function could be in the form of {{DataSet[(Int,Int,Int)]}},
meaning (user, item, rank), similar to Graphlab Create's output. However, this is arguable:
follow up work includes implementing ranking recommendation evaluation metrics (such as precision@k,
recall@k, ndcg@k), similar to [Spark's implementations|https://spark.apache.org/docs/1.5.0/mllib-evaluation-metrics.html#ranking-systems].
It would be beneficial if we were able to design the API such that it could be included in
the proposed evaluation framework (see [5157|https://issues.apache.org/jira/browse/FLINK-2157]),
which makes it neccessary to consider the possible output type {{DataSet[(Int, Array[Int])]}}
or {{DataSet[(Int, Array[(Int,Double)])]}} meaning (user, array of items), possibly including
the predicted scores as well. See [4713|https://issues.apache.org/jira/browse/FLINK-4713]
for details.
> Another question arising is whether to provide this function as a member of the ALS class,
as a switch-kind of parameter to the ALS implementation (meaning the model is either a rating
or a ranking recommender model) or in some other way.

This message was sent by Atlassian JIRA

View raw message