Return-Path: X-Original-To: apmail-flink-issues-archive@minotaur.apache.org Delivered-To: apmail-flink-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DE70218FCC for ; Thu, 21 May 2015 09:11:00 +0000 (UTC) Received: (qmail 71551 invoked by uid 500); 21 May 2015 09:11:00 -0000 Delivered-To: apmail-flink-issues-archive@flink.apache.org Received: (qmail 71505 invoked by uid 500); 21 May 2015 09:11:00 -0000 Mailing-List: contact issues-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list issues@flink.apache.org Received: (qmail 71481 invoked by uid 99); 21 May 2015 09:11:00 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 May 2015 09:11:00 +0000 Date: Thu, 21 May 2015 09:11:00 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: issues@flink.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (FLINK-2034) Add vision and roadmap for ML library to docs MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/FLINK-2034?page=3Dcom.atlassian= .jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1455= 3946#comment-14553946 ]=20 ASF GitHub Bot commented on FLINK-2034: --------------------------------------- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/688#discussion_r30784829 =20 --- Diff: docs/libs/ml/index.md --- @@ -20,8 +20,100 @@ specific language governing permissions and limitat= ions under the License. --> =20 +The Machine Learning (ML) library for Flink is a new effort to bring s= calable ML tools to the Flink +community. Our goal is is to design and implement a system that is sca= lable and can deal with +problems of various sizes, whether your data size is measured in megab= ytes or terabytes and beyond. +We call this library FlinkML. + +An important concern for developers of ML systems is the amount of glu= e code that developers are +forced to write [1] in the process of implementing an end-to-end ML sy= stem. Our goal with FlinkML +is to help developers keep glue code to a minimum. The Flink ecosystem= provides a great setting to +tackle this problem, with its scalable ETL capabilities that can be ea= sily combined inside the same +program with FlinkML, allowing the development of robust pipelines wit= hout the need to use yet +another technology for data ingestion and data munging. + +Another goal for FlinkML is to make the library easy to use. To that e= nd we will be providing +detailed documentation along with examples for every part of the syste= m. Our aim is that developers +will be able to get started with writing their ML pipelines quickly, u= sing familiar programming +concepts and terminology. + +Contrary to other data-processing systems, Flink exploits in-memory da= ta streaming, and natively +executes iterative processing algorithms which are common in ML. We pl= an to exploit the streaming +nature of Flink, and provide functionality designed specifically for d= ata streams. + +FlinkML will allow data scientists to test their models locally and us= ing subsets of data, and then +use the same code to run their algorithms at a much larger scale in a = cluster setting. + +We are inspired by other open source efforts to provide ML systems, in= particular +[scikit-learn](http://scikit-learn.org/) for cleanly specifying ML pip= elines, and Spark=E2=80=99s +[MLLib](https://spark.apache.org/mllib/) for providing ML algorithms t= hat scale with problem and +cluster sizes. + +We already have some of the building blocks for FlinkML in place, and = will continue to extend the +library with more algorithms. An example of how simple it is to create= a learning model in +FlinkML is given below: --- End diff -- =20 Good idea to show how to use FlinkML. I would extend the example a litt= le bit and put it into a separate section. Maybe making a small tutorial ou= t of it including the things you have to add to the your pom etc. > Add vision and roadmap for ML library to docs > --------------------------------------------- > > Key: FLINK-2034 > URL: https://issues.apache.org/jira/browse/FLINK-2034 > Project: Flink > Issue Type: Improvement > Components: Machine Learning Library > Reporter: Theodore Vasiloudis > Assignee: Theodore Vasiloudis > Labels: ML > Fix For: 0.9 > > > We should have a document describing the vision of the Machine Learning l= ibrary in Flink and an up to date roadmap. -- This message was sent by Atlassian JIRA (v6.3.4#6332)