flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Till Rohrmann <trohrm...@apache.org>
Subject Re: Kicking off the Machine Learning Library
Date Sun, 04 Jan 2015 18:14:38 GMT
The idea to work with H2O sounds really interesting.

In terms of the Mahout DSL this would mean that we have to translate a
Flink dataset into H2O's basic abstraction of distributed data and vice
versa. Everything other than writing to disk with one system and reading
from there with the other is probably non-trivial and hard to realize.
On Jan 4, 2015 9:18 AM, "Henry Saputra" <henry.saputra@gmail.com> wrote:

> Happy new year all!
>
> Like the idea to add ML module with Flink.
>
> As I have mentioned to Kostas, Stephan, and Robert before, I would
> love to see if we could work with H20 project [1], and it seemed like
> the community has added support for it for Apache Mahout backend
> binding [2].
>
> So we might get some additional scale ML algos like deep learning.
>
> Definitely would love to help with this initiative =)
>
> - Henry
>
> [1] https://github.com/h2oai/h2o-dev
> [2] https://issues.apache.org/jira/browse/MAHOUT-1500
>
> On Fri, Jan 2, 2015 at 6:46 AM, Stephan Ewen <sewen@apache.org> wrote:
> > Hi everyone!
> >
> > Happy new year, first of all and I hope you had a nice end-of-the-year
> > season.
> >
> > I thought that it is a good time now to officially kick off the creation
> of
> > a library of machine learning algorithms. There are a lot of individual
> > artifacts and algorithms floating around which we should consolidate.
> >
> > The machine-learning library in Flink would stand on two legs:
> >
> >  - A collection of efficient implementations for common problems and
> > algorithms, e.g., Regression (logistic), clustering (k-Means, Canopy),
> > Matrix Factorization (ALS), ...
> >
> >  - An adapter to the linear algebra DSL in Apache Mahout.
> >
> > In the long run, it would be the goal to be able to mix and match code
> from
> > both parts.
> > The linear algebra DSL is very convenient when it comes to quickly
> > composing an algorithm, or some custom pre- and post-processing steps.
> > For some complex algorithms, however, a low level system specific
> > implementation is necessary to make the algorithm efficient.
> > Being able to call the tailored algorithms from the DSL would combine the
> > benefits.
> >
> >
> > As a concrete initial step, I suggest to do the following:
> >
> > 1) We create a dedicated maven sub-project for that ML library
> > (flink-lib-ml). The project gets two sub-projects, one for the collection
> > of specialized algorithms, one for the Mahout DSL
> >
> > 2) We add the code for the existing specialized algorithms. As followup
> > work, we need to consolidate data types between those algorithms, to
> ensure
> > that they can easily be combined/chained.
> >
> > 3) The code for the Flink bindings to the Mahout DSL will actually reside
> > in the Mahout project, which we need to add as a dependency to
> flink-lib-ml.
> >
> > 4) We add some examples of Mahout DSL algorithms, and a template how to
> use
> > them within Flink programs.
> >
> > 5) Create a good introductory readme.md, outlining this structure. The
> > readme can also track the implemented algorithms and the ones we put on
> the
> > roadmap.
> >
> >
> > Comments welcome :-)
> >
> >
> > Greetings,
> > Stephan
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message