Return-Path: X-Original-To: apmail-flink-issues-archive@minotaur.apache.org Delivered-To: apmail-flink-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7EA2717CF3 for ; Mon, 9 Mar 2015 15:57:09 +0000 (UTC) Received: (qmail 40059 invoked by uid 500); 9 Mar 2015 15:57:00 -0000 Delivered-To: apmail-flink-issues-archive@flink.apache.org Received: (qmail 40013 invoked by uid 500); 9 Mar 2015 15:56:59 -0000 Mailing-List: contact issues-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list issues@flink.apache.org Received: (qmail 40004 invoked by uid 99); 9 Mar 2015 15:56:59 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Mar 2015 15:56:59 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.3] (HELO mail.apache.org) (140.211.11.3) by apache.org (qpsmtpd/0.29) with SMTP; Mon, 09 Mar 2015 15:56:58 +0000 Received: (qmail 39552 invoked by uid 99); 9 Mar 2015 15:56:38 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Mar 2015 15:56:38 +0000 Date: Mon, 9 Mar 2015 15:56:38 +0000 (UTC) From: "Sachin Goel (JIRA)" To: issues@flink.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (FLINK-1537) GSoC project: Machine learning with Apache Flink MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/FLINK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353134#comment-14353134 ] Sachin Goel edited comment on FLINK-1537 at 3/9/15 3:56 PM: ------------------------------------------------------------ Hi Till I really like the idea of asynchronous iterations and global state management. I think it can go a long way while working with batch update methods which can update over the data distributed over multiple machines, leading to finishing of one pass over data in just a single step. I have never played with job profiling and memory slash time analysis over distributed systems. So this should be a really interesting topic to work on. was (Author: sachingoel0101): I really like the idea of asynchronous iterations and global state management. I think it can go a long way while working with batch update methods which can update over the data distributed over multiple machines, leading to finishing of one pass over data in just a single step. I have never played with job profiling and memory slash time analysis over distributed systems. So this should be a really interesting topic to work on. > GSoC project: Machine learning with Apache Flink > ------------------------------------------------ > > Key: FLINK-1537 > URL: https://issues.apache.org/jira/browse/FLINK-1537 > Project: Flink > Issue Type: New Feature > Reporter: Till Rohrmann > Priority: Minor > Labels: gsoc2015, java, machine_learning, scala > > Currently, the Flink community is setting up the infrastructure for a machine learning library for Flink. The goal is to provide a set of highly optimized ML algorithms and to offer a high level linear algebra abstraction to easily do data pre- and post-processing. By defining a set of commonly used data structures on which the algorithms work it will be possible to define complex processing pipelines. > The Mahout DSL constitutes a good fit to be used as the linear algebra language in Flink. It has to be evaluated which means have to be provided to allow an easy transition between the high level abstraction and the optimized algorithms. > The machine learning library offers multiple starting points for a GSoC project. Amongst others, the following projects are conceivable. > * Extension of Flink's machine learning library by additional ML algorithms > ** Stochastic gradient descent > ** Distributed dual coordinate ascent > ** SVM > ** Gaussian mixture EM > ** DecisionTrees > ** ... > * Integration of Flink with the Mahout DSL to support a high level linear algebra abstraction > * Integration of H2O with Flink to benefit from H2O's sophisticated machine learning algorithms > * Implementation of a parameter server like distributed global state storage facility for Flink. This also includes the extension of Flink to support asynchronous iterations and update messages. > Own ideas for a possible contribution on the field of the machine learning library are highly welcome. -- This message was sent by Atlassian JIRA (v6.3.4#6332)