flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gaborhermann <...@git.apache.org>
Subject [GitHub] flink issue #2838: [FLINK-4712] [FLINK-4713] [ml] Ranking recommendation & e...
Date Thu, 24 Nov 2016 14:04:03 GMT
Github user gaborhermann commented on the issue:

    Thanks again for taking a look at our PR!
    I've just realized from a developer mailing list thread that the FlinkML API is still
not carved into stone even until 2.0, and it's nice to hear that :)
    The problem is not with the `evaluate(test: TestType): DataSet[Double]` but rather with
`evaluate(test: TestType): DataSet[(Prediction,Prediction)]`. It's at least confusing to have
both, but it might not be worth to expose the one giving `(Prediction,Prediction)` pairs to
the user as it only *prepares* evaluation. With introducing the evaluation framework, we could
at least rename it to something like `preparePairwiseEvaluation(test: TestType): DataSet[(Prediction,Prediction)]`.
In the ranking case we might generalize it to `prepareEvaluation(test: TestType): PreparedTesting`.
We basically did this with the `PrepareDataSetOperation`, we've just left the old `evaluate`
as it is for now. I suggest to change this if we can break the API.
    I'll do a rebase on the cross-validation PR. At first glance, it should not really be
a problem to do both cross-validation and hyper-parameter tuning, as the user has to provide
a `Scorer` anyway. A minor issue I see is the user falling back to a default `score` (e.g.
RMSE in case of ALS). This might not be a problem for recommendation models that give rating
predictions beside ranking predictions, but it's a problem for models that *only* give ranking
predictions, because those do not extend the `Predictor` class. This is not an issue for now,
but might be a problem when adding more recommendation models. Should we try and do this now
or is it a bit "overengineering"? I'll see if any other problem comes up with after rebasing.
    The `RankingPredictor` interface is useful *internally* for the `Score`s. It serves a
contract between a `RankingScore` and the model. I'm sure it will be used only for recommendations,
but it's no effort exposing it, so the user can write code using a general `RankingPredictor`
(although I would not think this is what users would like to do :) ). A better question is
whether to use it in a `Pipeline`. We discussed this with some people, and could not really
find a use-case where we need a `Transformer`-like preprocessing for recommendations. Of course,
there could be other preprocessing steps, such as removing/aggregating duplicates, but those
do not have to be `fit` to training data. Based on this, it's not worth the effort to integrate
`RankingPredictor` with the `Pipeline`, at least for now.

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.

View raw message