flink-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From trohrm...@apache.org
Subject [1/2] flink git commit: [FLINK-2272] [ml] Removed roadmap and vision from docs, added link to them in the wiki.
Date Thu, 02 Jul 2015 12:58:19 GMT
Repository: flink
Updated Branches:
  refs/heads/master a137321ac -> eb23f8074


[FLINK-2272] [ml] Removed roadmap and vision from docs, added link to them in the wiki.

This closes #864.


Project: http://git-wip-us.apache.org/repos/asf/flink/repo
Commit: http://git-wip-us.apache.org/repos/asf/flink/commit/4cc7cf35
Tree: http://git-wip-us.apache.org/repos/asf/flink/tree/4cc7cf35
Diff: http://git-wip-us.apache.org/repos/asf/flink/diff/4cc7cf35

Branch: refs/heads/master
Commit: 4cc7cf35e156ede9b2c146ce61b6d931bfe921d7
Parents: a137321
Author: Theodore Vasiloudis <tvas@sics.se>
Authored: Wed Jun 24 13:55:38 2015 +0200
Committer: Till Rohrmann <trohrmann@apache.org>
Committed: Thu Jul 2 14:31:58 2015 +0200

----------------------------------------------------------------------
 docs/libs/ml/index.md          |  2 +-
 docs/libs/ml/vision_roadmap.md | 99 -------------------------------------
 2 files changed, 1 insertion(+), 100 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/flink/blob/4cc7cf35/docs/libs/ml/index.md
----------------------------------------------------------------------
diff --git a/docs/libs/ml/index.md b/docs/libs/ml/index.md
index e81b354..63cdf43 100644
--- a/docs/libs/ml/index.md
+++ b/docs/libs/ml/index.md
@@ -24,7 +24,7 @@ FlinkML is the Machine Learning (ML) library for Flink. It is a new effort
in th
 with a growing list of algorithms and contributors. With FlinkML we aim to provide
 scalable ML algorithms, an intuitive API, and tools that help minimize glue code in end-to-end
ML
 systems. You can see more details about our goals and where the library is headed in our
[vision
-and roadmap here](vision_roadmap.html).
+and roadmap here](https://cwiki.apache.org/confluence/display/FLINK/FlinkML%3A+Vision+and+Roadmap).
 
 * This will be replaced by the TOC
 {:toc}

http://git-wip-us.apache.org/repos/asf/flink/blob/4cc7cf35/docs/libs/ml/vision_roadmap.md
----------------------------------------------------------------------
diff --git a/docs/libs/ml/vision_roadmap.md b/docs/libs/ml/vision_roadmap.md
deleted file mode 100644
index 24b651e..0000000
--- a/docs/libs/ml/vision_roadmap.md
+++ /dev/null
@@ -1,99 +0,0 @@
----
-htmlTitle: FlinkML - Vision and Roadmap
-title: <a href="../ml">FlinkML</a> - Vision and Roadmap
----
-<!--
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements.  See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership.  The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License.  You may obtain a copy of the License at
-
-  http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied.  See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-
-* This will be replaced by the TOC
-{:toc}
-
-## Vision
-
-The Machine Learning (ML) library for Flink is a new effort to bring scalable ML tools to
the Flink
-community. Our goal is is to design and implement a system that is scalable and can deal
with
-problems of various sizes, whether your data size is measured in megabytes or terabytes and
beyond.
-We call this library FlinkML.
-
-An important concern for developers of ML systems is the amount of glue code that developers
are
-forced to write [1] in the process of implementing an end-to-end ML system. Our goal with
FlinkML
-is to help developers keep glue code to a minimum. The Flink ecosystem provides a great setting
to
-tackle this problem, with its scalable ETL capabilities that can be easily combined inside
the same
-program with FlinkML, allowing the development of robust pipelines without the need to use
yet
-another technology for data ingestion and data munging.
-
-Another goal for FlinkML is to make the library easy to use. To that end we will be providing
-detailed documentation along with examples for every part of the system. Our aim is that
developers
-will be able to get started with writing their ML pipelines quickly, using familiar programming
-concepts and terminology.
-
-Contrary to other data-processing systems, Flink exploits in-memory data streaming, and natively
-executes iterative processing algorithms which are common in ML. We plan to exploit the streaming
-nature of Flink, and provide functionality designed specifically for data streams.
-
-FlinkML will allow data scientists to test their models locally and using subsets of data,
and then
-use the same code to run their algorithms at a much larger scale in a cluster setting.
-
-We are inspired by other open source efforts to provide ML systems, in particular
-[scikit-learn](http://scikit-learn.org/) for cleanly specifying ML pipelines, and Spark’s
-[MLLib](https://spark.apache.org/mllib/) for providing ML algorithms that scale with problem
and
-cluster sizes.
-
-## Roadmap
-
-The roadmap below can provide an indication of the algorithms we aim to implement in the
coming
-months. If you are interested in helping out, please check our [contribution guide](contribution_guide.html).
-Items in **bold** have already been implemented:
-
-* Pipelines of transformers and learners
-* Data pre-processing
-  * **Feature scaling**
-  * **Polynomial feature base mapper**
-  * Feature hashing
-  * Feature extraction for text
-  * Dimensionality reduction
-* Model selection and performance evaluation
-  * Cross-validation for model selection and evaluation
-* Supervised learning
-  * Optimization framework
-    * **Stochastic Gradient Descent**
-    * L-BFGS
-  * Generalized Linear Models
-    * **Multiple linear regression**
-    * LASSO, Ridge regression
-    * Multi-class Logistic regression
-  * Random forests
-  * **Support Vector Machines**
-* Unsupervised learning
-  * Clustering
-    * K-means clustering
-  * PCA
-* Recommendation
-  * **ALS**
-* Text analytics
-  * LDA
-* Statistical estimation tools
-* Distributed linear algebra
-* Streaming ML
-
-**References:**
-
-[1] D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary,
-and M. Young. _Machine learning: The high interest credit card of technical debt._ In SE4ML:
-Software Engineering for Machine Learning (NIPS 2014 Workshop), 2014.


Mime
View raw message