flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sachin Goel (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-1537) GSoC project: Machine learning with Apache Flink
Date Mon, 09 Mar 2015 15:44:38 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353130#comment-14353130
] 

Sachin Goel commented on FLINK-1537:
------------------------------------

Hi Till
Thanks for such a detailed description. I have already gotten started with going through the
documentation and yes, I can get started with an implementation of the Random forests right
away. However, it would be more prudent to implement a decision tree first, which then easily
generalizes to the forest, with some implementation issues eg. bootstrapping, selecting random
attributes, etc. which we can figure out how to deal with, depending on what specific APIs
Flink provides.
Also, I would like to work on a framework for deep learning over a distributed system. Most
of deep learning is currently done over GPUs only. I thought we could discuss how that would
proceed.

> GSoC project: Machine learning with Apache Flink
> ------------------------------------------------
>
>                 Key: FLINK-1537
>                 URL: https://issues.apache.org/jira/browse/FLINK-1537
>             Project: Flink
>          Issue Type: New Feature
>            Reporter: Till Rohrmann
>            Priority: Minor
>              Labels: gsoc2015, java, machine_learning, scala
>
> Currently, the Flink community is setting up the infrastructure for a machine learning
library for Flink. The goal is to provide a set of highly optimized ML algorithms and to offer
a high level linear algebra abstraction to easily do data pre- and post-processing. By defining
a set of commonly used data structures on which the algorithms work it will be possible to
define complex processing pipelines. 
> The Mahout DSL constitutes a good fit to be used as the linear algebra language in Flink.
It has to be evaluated which means have to be provided to allow an easy transition between
the high level abstraction and the optimized algorithms.
> The machine learning library offers multiple starting points for a GSoC project. Amongst
others, the following projects are conceivable.
> * Extension of Flink's machine learning library by additional ML algorithms
> ** Stochastic gradient descent
> ** Distributed dual coordinate ascent
> ** SVM
> ** Gaussian mixture EM
> ** DecisionTrees
> ** ...
> * Integration of Flink with the Mahout DSL to support a high level linear algebra abstraction
> * Integration of H2O with Flink to benefit from H2O's sophisticated machine learning
algorithms
> * Implementation of a parameter server like distributed global state storage facility
for Flink. This also includes the extension of Flink to support asynchronous iterations and
update messages.
> Own ideas for a possible contribution on the field of the machine learning library are
highly welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message