flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chiwan Park <chiwanp...@apache.org>
Subject Re: Opening a discussion on FlinkML
Date Fri, 12 Feb 2016 11:59:04 GMT

I agree what Theo said. Currently, only few committers spend time to review PRs about FlinkML.
But I also agree Fabian’s opinion. I would like to keep FlinkML under main repository of
Flink. I hope new committers spending time for FlinkML.

About Simone’s opinion, yes, FlinkML is still immature ML library. There is a lack of many
useful features and some of the features are pending in pull requests.

Integration with some other libraries such as Mahout, H2O, Weka would be also good. Already
there are some attempts using Flink or other distributed data processing framework as a backend
of other library [1] [2] [3]. But I think, as you can see the link, we have to re-implement
many algorithms even though we integrate other library with Flink. I doubt if there is a big
development advantage of integration.

[1]: https://issues.apache.org/jira/browse/MAHOUT-1570
[2]: http://mahout.apache.org/users/basics/algorithms.html
[3]: https://github.com/ariskk/distributedWekaSpark

Chiwan Park

> On Feb 12, 2016, at 7:04 PM, Fabian Hueske <fhueske@gmail.com> wrote:
> Hi Theo,
> thanks for starting this discussion. You are certainly right that the
> development of FlinkML is stalling. On the other hand, we regularly see
> people on the mailing list asking for feature.
> Regarding your proposed ways to proceed:
> 1) I am not sure how much it would help to move FlinkML to a separate
> repository.
> We have discussed to move connectors (and libraries) to separate
> repositories before but the thread fall asleep [1].
> We would still need committers to spend time with reviewing, merging, and
> contributing.
> So IMO, this is orthogonal to having more committer involvement.
> 2) Having committers (current /  new ones) spending time on FlinkML is the
> requirement for keep it alive within the Flink project.
> Adding new committers is kind of a bootstrap problem here because it is
> hard for contributors to get involved with FlinkML if very little committer
> time is spend on code reviews and merging. Nonetheless, I see this as the
> best option.
> 3) Forking of a project on Github is certainly possible (even without the
> endorsement of the Flink community). However, merging changes back into
> Flink would again require a committer to review and merge (probably a much
> larger chunk of code) and also require the permission of all contributors.
> Best,
> Fabian
> [1]
> https://mail-archives.apache.org/mod_mbox/flink-dev/201512.mbox/%3CCAGco--aZhZhrrSzzPROwXwmtYmD5CkoGKe7xNCWG1Vw7V-D%2BaA%40mail.gmail.com%3E
> 2016-02-12 10:23 GMT+01:00 Theodore Vasiloudis <
> theodoros.vasiloudis@gmail.com>:
>> Hello all,
>> I would like to get a conversation started on how we plan to move forward
>> with FlinkML.
>> Development on the library currently has been mostly dormant for the past 6
>> months,
>> mainly I believe because of the lack of available committers to review PRs.
>> Last month we got together with Till and Marton and talked about how we
>> could try to
>> solve this and ensure continued development of the library.
>> We see 3 possible paths we could take:
>>   1.
>>   Externalize the library, creating a new repository under the Apache
>>   Flink project. This decouples the development of FlinkML from the Flink
>>   release cycle, allowing us to move faster and incorporate new features
>> as
>>   they become available. As FlinkML is a library under development tying
>> it
>>   to specific versions does not make much sense anyway. The library would
>>   depend on the latest snapshot version of Flink. It would then be
>> possible
>>   for the Flink distribution to cherry-pick parts of the library to be
>>   included with the core distribution.
>>   2.
>>   Keep the development under the main Flink project but bring in new
>>   committers. This would mean that the development remains as is and is
>> tied
>>   to core Flink releases, but new worked should get merged at much more
>>   regular intervals through the help of committers other than Till. Marton
>>   Balassi has volunteered for that role and I hope that more might take up
>>   that role.
>>   3. A third option is to fork FlinkML on a repository on which we are
>>   able to commit freely (again through PRs and reviews of course) and
>> merge
>>   good parts back into the main repo once in a while. This allows for
>> faster
>>   progress and more experimental work but obviously creates fragmentation.
>> I would like to hear your thoughts on these three options, as well as
>> discuss other
>> alternatives that could help move FlinkML forward.
>> Cheers,
>> Theodore

View raw message