flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gábor Hermann <m...@gaborhermann.com>
Subject Re: [DISCUSS] Flink ML roadmap
Date Mon, 20 Feb 2017 11:47:54 GMT
Hi Stavros,

Thanks for bringing this up.

There have been past [1] and recent [2, 3] discussions about the Flink 
libraries, because there are some stalling PRs and overloaded 
committers. (Actually, Till is the only committer shepherd of the both 
the CEP and ML library, and AFAIK he has a ton of other responsibilities 
and work to do.) Thus it's hard to get code reviewed and merged, and 
without merged code it's hard to get a committer status, so there are 
not many committers who can review e.g. ML algorithm implementations, 
and the cycle goes on. Until this is resolved somehow, we should help 
the committers by reviewing each-others PRs.

I think prioritizing features (b) is a good way to start. We could 
declare most blocking features and concentrate on reviewing and merging 
them before moving forward. E.g. the evaluation framework is quite 
important for an ML library in my opinion, and has a PR stalling for 
long [4].

Regarding c),  there are styleguides generally for contributing to 
Flink, so we should follow that. Is there something more ML specific you 
think we could follow? We should definitely declare, we follow 
scikit-learn and make sure contributions comply to that.

In terms of features (a, d), I think we should first see the bigger 
picture. That is, it would be nice to discuss a clearer direction for 
Flink ML. I've seen a lot of interest in contributing to Flink ML 
lately. I believe we should rethink our goals, to put the contribution 
efforts in making a usable and useful library. Are we trying to 
implement as many useful algorithms as possible to create a scalable ML 
library? That would seem ambitious, and of course there are a lot of 
frameworks and libraries that already has something like this as goal 
(e.g. Spark MLlib, Mahout). Should we rather create connectors to 
existing libraries? Then we cannot really do Flink specific 
optimizations. Should we go for online machine learning (as Flink is 
concentrating on streaming)? We already have a connector to SAMOA. We 
could go on with questions like this. Maybe I'm missing something, but I 
haven't seen such directions declared.

Cheers,
Gabor

[1] 
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Opening-a-discussion-on-FlinkML-td10265.html
[2] 
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Flink-CEP-development-is-stalling-td15237.html#a15341
[3] 
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/New-Flink-team-member-Kate-Eri-td15349.html
[4] https://github.com/apache/flink/pull/1849

On 2017-02-20 11:43, Stavros Kontopoulos wrote:

> (Resending with the appropriate topic)
>
> Hi,
>
> I would like to start a discussion about next steps for Flink ML.
> Currently there is a lot of work going on but needs a push forward.
>
> Some topics to discuss:
>
> a) How several features should be planned and get aligned with Flink
> releases.
> b) Priorities of what should be done.
> c) Basic guidelines for code: styleguides, scikit-learn compliance etc
> d) Missing features important for the success of the library, next steps
> etc...
>
> Thoughts?
>
> Best,
> Stavros
>


Mime
View raw message