flink-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From u..@apache.org
Subject [2/6] flink git commit: [docs] Move libraries to batch and streaming guides
Date Wed, 03 Feb 2016 18:23:39 GMT
http://git-wip-us.apache.org/repos/asf/flink/blob/35ec26cd/docs/libs/ml/distance_metrics.md
----------------------------------------------------------------------
diff --git a/docs/libs/ml/distance_metrics.md b/docs/libs/ml/distance_metrics.md
index 1a7364a..270fbfa 100644
--- a/docs/libs/ml/distance_metrics.md
+++ b/docs/libs/ml/distance_metrics.md
@@ -1,11 +1,5 @@
 ---
-mathjax: include
 title: FlinkML - Distance Metrics
-
-# Sub navigation
-sub-nav-group: batch
-sub-nav-parent: flinkml
-sub-nav-title: Distance Metrics
 ---
 <!--
 Licensed to the Apache Software Foundation (ASF) under one
@@ -26,86 +20,6 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-* This will be replaced by the TOC
-{:toc}
-
-## Description
-
-Different metrics of distance are convenient for different types of analysis. Flink ML provides
-built-in implementations for many standard distance metrics. You can create custom
-distance metrics by implementing the `DistanceMetric` trait.
-
-## Built-in Implementations
-
-Currently, FlinkML supports the following metrics:
-
-<table class="table table-bordered">
-    <thead>
-      <tr>
-        <th class="text-left" style="width: 20%">Metric</th>
-        <th class="text-center">Description</th>
-      </tr>
-    </thead>
-
-    <tbody>
-      <tr>
-        <td><strong>Euclidean Distance</strong></td>
-        <td>
-          $$d(\x, \y) = \sqrt{\sum_{i=1}^n \left(x_i - y_i \right)^2}$$
-        </td>
-      </tr>
-      <tr>
-        <td><strong>Squared Euclidean Distance</strong></td>
-        <td>
-          $$d(\x, \y) = \sum_{i=1}^n \left(x_i - y_i \right)^2$$
-        </td>
-      </tr>
-      <tr>
-        <td><strong>Cosine Similarity</strong></td>
-        <td>
-          $$d(\x, \y) = 1 - \frac{\x^T \y}{\Vert \x \Vert \Vert \y \Vert}$$
-        </td>
-      </tr>
-      <tr>
-        <td><strong>Chebyshev Distance</strong></td>
-        <td>
-          $$d(\x, \y) = \max_{i}\left(\left \vert x_i - y_i \right\vert \right)$$
-        </td>
-      </tr>
-      <tr>
-        <td><strong>Manhattan Distance</strong></td>
-        <td>
-          $$d(\x, \y) = \sum_{i=1}^n \left\vert x_i - y_i \right\vert$$
-        </td>
-      </tr>
-      <tr>
-        <td><strong>Minkowski Distance</strong></td>
-        <td>
-          $$d(\x, \y) = \left( \sum_{i=1}^{n} \left( x_i - y_i \right)^p \right)^{\rfrac{1}{p}}$$
-        </td>
-      </tr>
-      <tr>
-        <td><strong>Tanimoto Distance</strong></td>
-        <td>
-          $$d(\x, \y) = 1 - \frac{\x^T\y}{\Vert \x \Vert^2 + \Vert \y \Vert^2 - \x^T\y}$$ 
-          with $\x$ and $\y$ being bit-vectors
-        </td>
-      </tr>
-    </tbody>
-  </table>
-
-## Custom Implementation
-
-You can create your own distance metric by implementing the `DistanceMetric` trait.
-
-{% highlight scala %}
-class MyDistance extends DistanceMetric {
-  override def distance(a: Vector, b: Vector) = ... // your implementation for distance metric
-}
-
-object MyDistance {
-  def apply() = new MyDistance()
-}
+<meta http-equiv="refresh" content="1; url={{ site.baseurl }}/apis/batch/libs/ml/distance_metrics.html" />
 
-val myMetric = MyDistance()
-{% endhighlight %}
+The *distance metrics guide* has been moved. Redirecting to [{{ site.baseurl }}/apis/batch/libs/ml/distance_metrics.html]({{ site.baseurl }}/apis/batch/libs/ml/distance_metrics.html) in 1 second.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/flink/blob/35ec26cd/docs/libs/ml/index.md
----------------------------------------------------------------------
diff --git a/docs/libs/ml/index.md b/docs/libs/ml/index.md
index c3b6316..27b8364 100644
--- a/docs/libs/ml/index.md
+++ b/docs/libs/ml/index.md
@@ -1,15 +1,5 @@
 ---
 title: "FlinkML - Machine Learning for Flink"
-# Top navigation
-top-nav-group: libs
-top-nav-pos: 2
-top-nav-title: Machine Learning
-# Sub navigation
-sub-nav-group: batch
-sub-nav-id: flinkml
-sub-nav-pos: 2
-sub-nav-parent: libs
-sub-nav-title: Machine Learning
 ---
 <!--
 Licensed to the Apache Software Foundation (ASF) under one
@@ -30,109 +20,6 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-FlinkML is the Machine Learning (ML) library for Flink. It is a new effort in the Flink community,
-with a growing list of algorithms and contributors. With FlinkML we aim to provide
-scalable ML algorithms, an intuitive API, and tools that help minimize glue code in end-to-end ML
-systems. You can see more details about our goals and where the library is headed in our [vision
-and roadmap here](https://cwiki.apache.org/confluence/display/FLINK/FlinkML%3A+Vision+and+Roadmap).
+<meta http-equiv="refresh" content="1; url={{ site.baseurl }}/apis/batch/libs/ml/index.html" />
 
-* This will be replaced by the TOC
-{:toc}
-
-## Supported Algorithms
-
-FlinkML currently supports the following algorithms:
-
-### Supervised Learning
-
-* [SVM using Communication efficient distributed dual coordinate ascent (CoCoA)](svm.html)
-* [Multiple linear regression](multiple_linear_regression.html)
-* [Optimization Framework](optimization.html)
-
-### Data Preprocessing
-
-* [Polynomial Features](polynomial_features.html)
-* [Standard Scaler](standard_scaler.html)
-* [MinMax Scaler](min_max_scaler.html)
-
-### Recommendation
-
-* [Alternating Least Squares (ALS)](als.html)
-
-### Utilities
-
-* [Distance Metrics](distance_metrics.html)
-
-## Getting Started
-
-You can check out our [quickstart guide](quickstart.html) for a comprehensive getting started
-example.
-
-If you want to jump right in, you have to [set up a Flink program]({{ site.baseurl }}/apis/batch/index.html#linking-with-flink).
-Next, you have to add the FlinkML dependency to the `pom.xml` of your project.
-
-{% highlight xml %}
-<dependency>
-  <groupId>org.apache.flink</groupId>
-  <artifactId>flink-ml{{ site.scala_version_suffix }}</artifactId>
-  <version>{{site.version }}</version>
-</dependency>
-{% endhighlight %}
-
-Note that FlinkML is currently not part of the binary distribution.
-See linking with it for cluster execution [here]({{site.baseurl}}/apis/cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution).
-
-Now you can start solving your analysis task.
-The following code snippet shows how easy it is to train a multiple linear regression model.
-
-{% highlight scala %}
-// LabeledVector is a feature vector with a label (class or real value)
-val trainingData: DataSet[LabeledVector] = ...
-val testingData: DataSet[Vector] = ...
-
-val mlr = MultipleLinearRegression()
-  .setStepsize(1.0)
-  .setIterations(100)
-  .setConvergenceThreshold(0.001)
-
-mlr.fit(trainingData)
-
-// The fitted model can now be used to make predictions
-val predictions: DataSet[LabeledVector] = mlr.predict(testingData)
-{% endhighlight %}
-
-## Pipelines
-
-A key concept of FlinkML is its [scikit-learn](http://scikit-learn.org) inspired pipelining mechanism.
-It allows you to quickly build complex data analysis pipelines how they appear in every data scientist's daily work.
-An in-depth description of FlinkML's pipelines and their internal workings can be found [here](pipelines.html).
-
-The following example code shows how easy it is to set up an analysis pipeline with FlinkML.
-
-{% highlight scala %}
-val trainingData: DataSet[LabeledVector] = ...
-val testingData: DataSet[Vector] = ...
-
-val scaler = StandardScaler()
-val polyFeatures = PolynomialFeatures().setDegree(3)
-val mlr = MultipleLinearRegression()
-
-// Construct pipeline of standard scaler, polynomial features and multiple linear regression
-val pipeline = scaler.chainTransformer(polyFeatures).chainPredictor(mlr)
-
-// Train pipeline
-pipeline.fit(trainingData)
-
-// Calculate predictions
-val predictions: DataSet[LabeledVector] = pipeline.predict(testingData)
-{% endhighlight %}
-
-One can chain a `Transformer` to another `Transformer` or a set of chained `Transformers` by calling the method `chainTransformer`.
-If one wants to chain a `Predictor` to a `Transformer` or a set of chained `Transformers`, one has to call the method `chainPredictor`.
-
-
-## How to contribute
-
-The Flink community welcomes all contributors who want to get involved in the development of Flink and its libraries.
-In order to get quickly started with contributing to FlinkML, please read our official
-[contribution guide]({{site.baseurl}}/libs/ml/contribution_guide.html).
+The *Machine Learning guide* has been moved. Redirecting to [{{ site.baseurl }}/apis/batch/libs/ml/index.html]({{ site.baseurl }}/apis/batch/libs/ml/index.html) in 1 second.

http://git-wip-us.apache.org/repos/asf/flink/blob/35ec26cd/docs/libs/ml/min_max_scaler.md
----------------------------------------------------------------------
diff --git a/docs/libs/ml/min_max_scaler.md b/docs/libs/ml/min_max_scaler.md
index 302bf4d..e9f6e92 100644
--- a/docs/libs/ml/min_max_scaler.md
+++ b/docs/libs/ml/min_max_scaler.md
@@ -1,11 +1,5 @@
 ---
-mathjax: include
-title: <a href="../ml">FlinkML</a> - MinMax Scaler
-
-# Sub navigation
-sub-nav-group: batch
-sub-nav-parent: flinkml
-sub-nav-title: MinMax Scaler
+title: FlinkML - MinMax Scaler
 ---
 <!--
 Licensed to the Apache Software Foundation (ASF) under one
@@ -26,91 +20,6 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-* This will be replaced by the TOC
-{:toc}
-
-## Description
-
- The MinMax scaler scales the given data set, so that all values will lie between a user specified range [min,max].
- In case the user does not provide a specific minimum and maximum value for the scaling range, the MinMax scaler transforms the features of the input data set to lie in the [0,1] interval.
- Given a set of input data $x_1, x_2,... x_n$, with minimum value:
-
- $$x_{min} = min({x_1, x_2,..., x_n})$$
-
- and maximum value:
-
- $$x_{max} = max({x_1, x_2,..., x_n})$$
-
-The scaled data set $z_1, z_2,...,z_n$ will be:
-
- $$z_{i}= \frac{x_{i} - x_{min}}{x_{max} - x_{min}} \left ( max - min \right ) + min$$
-
-where $\textit{min}$ and $\textit{max}$ are the user specified minimum and maximum values of the range to scale.
-
-## Operations
-
-`MinMaxScaler` is a `Transformer`.
-As such, it supports the `fit` and `transform` operation.
-
-### Fit
-
-MinMaxScaler is trained on all subtypes of `Vector` or `LabeledVector`:
-
-* `fit[T <: Vector]: DataSet[T] => Unit`
-* `fit: DataSet[LabeledVector] => Unit`
-
-### Transform
-
-MinMaxScaler transforms all subtypes of `Vector` or `LabeledVector` into the respective type:
-
-* `transform[T <: Vector]: DataSet[T] => DataSet[T]`
-* `transform: DataSet[LabeledVector] => DataSet[LabeledVector]`
-
-## Parameters
-
-The MinMax scaler implementation can be controlled by the following two parameters:
-
- <table class="table table-bordered">
-  <thead>
-    <tr>
-      <th class="text-left" style="width: 20%">Parameters</th>
-      <th class="text-center">Description</th>
-    </tr>
-  </thead>
-
-  <tbody>
-    <tr>
-      <td><strong>Min</strong></td>
-      <td>
-        <p>
-          The minimum value of the range for the scaled data set. (Default value: <strong>0.0</strong>)
-        </p>
-      </td>
-    </tr>
-    <tr>
-      <td><strong>Max</strong></td>
-      <td>
-        <p>
-          The maximum value of the range for the scaled data set. (Default value: <strong>1.0</strong>)
-        </p>
-      </td>
-    </tr>
-  </tbody>
-</table>
-
-## Examples
-
-{% highlight scala %}
-// Create MinMax scaler transformer
-val minMaxscaler = MinMaxScaler()
-  .setMin(-1.0)
-
-// Obtain data set to be scaled
-val dataSet: DataSet[Vector] = ...
-
-// Learn the minimum and maximum values of the training data
-minMaxscaler.fit(dataSet)
+<meta http-equiv="refresh" content="1; url={{ site.baseurl }}/apis/batch/libs/ml/min_max_scaler.html" />
 
-// Scale the provided data set to have min=-1.0 and max=1.0
-val scaledDS = minMaxscaler.transform(dataSet)
-{% endhighlight %}
+The *MinMax Scaler guide* has been moved. Redirecting to [{{ site.baseurl }}/apis/batch/libs/ml/min_max_scaler.html]({{ site.baseurl }}/apis/batch/libs/ml/min_max_scaler.html) in 1 second.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/flink/blob/35ec26cd/docs/libs/ml/multiple_linear_regression.md
----------------------------------------------------------------------
diff --git a/docs/libs/ml/multiple_linear_regression.md b/docs/libs/ml/multiple_linear_regression.md
index 58c4851..ef1db2c 100644
--- a/docs/libs/ml/multiple_linear_regression.md
+++ b/docs/libs/ml/multiple_linear_regression.md
@@ -1,11 +1,5 @@
 ---
-mathjax: include
 title: FlinkML - Multiple linear regression
-
-# Sub navigation
-sub-nav-group: batch
-sub-nav-parent: flinkml
-sub-nav-title: Multiple Linear Regression
 ---
 <!--
 Licensed to the Apache Software Foundation (ASF) under one
@@ -26,139 +20,6 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-* This will be replaced by the TOC
-{:toc}
-
-## Description
-
- Multiple linear regression tries to find a linear function which best fits the provided input data.
- Given a set of input data with its value $(\mathbf{x}, y)$, multiple linear regression finds
- a vector $\mathbf{w}$ such that the sum of the squared residuals is minimized:
-
- $$ S(\mathbf{w}) = \sum_{i=1} \left(y - \mathbf{w}^T\mathbf{x_i} \right)^2$$
-
- Written in matrix notation, we obtain the following formulation:
-
- $$\mathbf{w}^* = \arg \min_{\mathbf{w}} (\mathbf{y} - X\mathbf{w})^2$$
-
- This problem has a closed form solution which is given by:
-
-  $$\mathbf{w}^* = \left(X^TX\right)^{-1}X^T\mathbf{y}$$
-
-  However, in cases where the input data set is so huge that a complete parse over the whole data
-  set is prohibitive, one can apply stochastic gradient descent (SGD) to approximate the solution.
-  SGD first calculates for a random subset of the input data set the gradients. The gradient
-  for a given point $\mathbf{x}_i$ is given by:
-
-  $$\nabla_{\mathbf{w}} S(\mathbf{w}, \mathbf{x_i}) = 2\left(\mathbf{w}^T\mathbf{x_i} -
-    y\right)\mathbf{x_i}$$
-
-  The gradients are averaged and scaled. The scaling is defined by $\gamma = \frac{s}{\sqrt{j}}$
-  with $s$ being the initial step size and $j$ being the current iteration number. The resulting gradient is subtracted from the
-  current weight vector giving the new weight vector for the next iteration:
-
-  $$\mathbf{w}_{t+1} = \mathbf{w}_t - \gamma \frac{1}{n}\sum_{i=1}^n \nabla_{\mathbf{w}} S(\mathbf{w}, \mathbf{x_i})$$
-
-  The multiple linear regression algorithm computes either a fixed number of SGD iterations or terminates based on a dynamic convergence criterion.
-  The convergence criterion is the relative change in the sum of squared residuals:
-
-  $$\frac{S_{k-1} - S_k}{S_{k-1}} < \rho$$
-  
-## Operations
-
-`MultipleLinearRegression` is a `Predictor`.
-As such, it supports the `fit` and `predict` operation.
-
-### Fit
-
-MultipleLinearRegression is trained on a set of `LabeledVector`: 
-
-* `fit: DataSet[LabeledVector] => Unit`
-
-### Predict
-
-MultipleLinearRegression predicts for all subtypes of `Vector` the corresponding regression value: 
-
-* `predict[T <: Vector]: DataSet[T] => DataSet[LabeledVector]`
-
-If we call predict with a `DataSet[LabeledVector]`, we make a prediction on the regression value
-for each example, and return a `DataSet[(Double, Double)]`. In each tuple the first element
-is the true value, as was provided from the input `DataSet[LabeledVector]` and the second element
-is the predicted value. You can then use these `(truth, prediction)` tuples to evaluate
-the algorithm's performance.
-
-* `predict: DataSet[LabeledVector] => DataSet[(Double, Double)]`
-
-## Parameters
-
-  The multiple linear regression implementation can be controlled by the following parameters:
-  
-   <table class="table table-bordered">
-    <thead>
-      <tr>
-        <th class="text-left" style="width: 20%">Parameters</th>
-        <th class="text-center">Description</th>
-      </tr>
-    </thead>
-
-    <tbody>
-      <tr>
-        <td><strong>Iterations</strong></td>
-        <td>
-          <p>
-            The maximum number of iterations. (Default value: <strong>10</strong>)
-          </p>
-        </td>
-      </tr>
-      <tr>
-        <td><strong>Stepsize</strong></td>
-        <td>
-          <p>
-            Initial step size for the gradient descent method.
-            This value controls how far the gradient descent method moves in the opposite direction of the gradient.
-            Tuning this parameter might be crucial to make it stable and to obtain a better performance. 
-            (Default value: <strong>0.1</strong>)
-          </p>
-        </td>
-      </tr>
-      <tr>
-        <td><strong>ConvergenceThreshold</strong></td>
-        <td>
-          <p>
-            Threshold for relative change of the sum of squared residuals until the iteration is stopped.
-            (Default value: <strong>None</strong>)
-          </p>
-        </td>
-      </tr>
-      <tr>
-        <td><strong>LearningRateMethod</strong></td>
-        <td>
-            <p>
-                Learning rate method used to calculate the effective learning rate for each iteration.
-                See the list of supported <a href="optimization.html">learing rate methods</a>.
-                (Default value: <strong>LearningRateMethod.Default</strong>)
-            </p>
-        </td>
-      </tr>
-    </tbody>
-  </table>
-
-## Examples
-
-{% highlight scala %}
-// Create multiple linear regression learner
-val mlr = MultipleLinearRegression()
-.setIterations(10)
-.setStepsize(0.5)
-.setConvergenceThreshold(0.001)
-
-// Obtain training and testing data set
-val trainingDS: DataSet[LabeledVector] = ...
-val testingDS: DataSet[Vector] = ...
-
-// Fit the linear model to the provided data
-mlr.fit(trainingDS)
+<meta http-equiv="refresh" content="1; url={{ site.baseurl }}/apis/batch/libs/ml/multiple_linear_regression.html" />
 
-// Calculate the predictions for the test data
-val predictions = mlr.predict(testingDS)
-{% endhighlight %}
+The *multiple linear regression guide* has been moved. Redirecting to [{{ site.baseurl }}/apis/batch/libs/ml/multiple_linear_regression.html]({{ site.baseurl }}/apis/batch/libs/ml/multiple_linear_regression.html) in 1 second.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/flink/blob/35ec26cd/docs/libs/ml/optimization.md
----------------------------------------------------------------------
diff --git a/docs/libs/ml/optimization.md b/docs/libs/ml/optimization.md
index 08dcab7..d7b555e 100644
--- a/docs/libs/ml/optimization.md
+++ b/docs/libs/ml/optimization.md
@@ -1,10 +1,5 @@
 ---
-mathjax: include
 title: FlinkML - Optimization
-# Sub navigation
-sub-nav-group: batch
-sub-nav-parent: flinkml
-sub-nav-title: Optimization
 ---
 <!--
 Licensed to the Apache Software Foundation (ASF) under one
@@ -25,361 +20,6 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-* Table of contents
-{:toc}
+<meta http-equiv="refresh" content="1; url={{ site.baseurl }}/apis/batch/libs/ml/optimization.html" />
 
-## Mathematical Formulation
-
-The optimization framework in FlinkML is a developer-oriented package that can be used to solve
-[optimization](https://en.wikipedia.org/wiki/Mathematical_optimization)
-problems common in Machine Learning (ML) tasks. In the supervised learning context, this usually
-involves finding a model, as defined by a set of parameters $w$, that minimize a function $f(\wv)$
-given a set of $(\x, y)$ examples,
-where $\x$ is a feature vector and $y$ is a real number, which can represent either a real value in
-the regression case, or a class label in the classification case. In supervised learning, the
-function to be minimized is usually of the form:
-
-
-\begin{equation} \label{eq:objectiveFunc}
-    f(\wv) :=
-    \frac1n \sum_{i=1}^n L(\wv;\x_i,y_i) +
-    \lambda\, R(\wv)
-    \ .
-\end{equation}
-
-
-where $L$ is the loss function and $R(\wv)$ the regularization penalty. We use $L$ to measure how
-well the model fits the observed data, and we use $R$ in order to impose a complexity cost to the
-model, with $\lambda > 0$ being the regularization parameter.
-
-### Loss Functions
-
-In supervised learning, we use loss functions in order to measure the model fit, by
-penalizing errors in the predictions $p$ made by the model compared to the true $y$ for each
-example. Different loss functions can be used for regression (e.g. Squared Loss) and classification
-(e.g. Hinge Loss) tasks.
-
-Some common loss functions are:
-
-* Squared Loss: $ \frac{1}{2} \left(\wv^T \cdot \x - y\right)^2, \quad y \in \R $
-* Hinge Loss: $ \max \left(0, 1 - y ~ \wv^T \cdot \x\right), \quad y \in \{-1, +1\} $
-* Logistic Loss: $ \log\left(1+\exp\left( -y ~ \wv^T \cdot \x\right)\right), \quad y \in \{-1, +1\}$
-
-### Regularization Types
-
-[Regularization](https://en.wikipedia.org/wiki/Regularization_(mathematics)) in machine learning
-imposes penalties to the estimated models, in order to reduce overfitting. The most common penalties
-are the $L_1$ and $L_2$ penalties, defined as:
-
-* $L_1$: $R(\wv) = \norm{\wv}_1$
-* $L_2$: $R(\wv) = \frac{1}{2}\norm{\wv}_2^2$
-
-The $L_2$ penalty penalizes large weights, favoring solutions with more small weights rather than
-few large ones.
-The $L_1$ penalty can be used to drive a number of the solution coefficients to 0, thereby
-producing sparse solutions.
-The regularization constant $\lambda$ in $\eqref{eq:objectiveFunc}$ determines the amount of regularization applied to the model,
-and is usually determined through model cross-validation.
-A good comparison of regularization types can be found in [this](http://www.robotics.stanford.edu/~ang/papers/icml04-l1l2.pdf) paper by Andrew Ng.
-Which regularization type is supported depends on the actually used optimization algorithm.
-
-## Stochastic Gradient Descent
-
-In order to find a (local) minimum of a function, Gradient Descent methods take steps in the
-direction opposite to the gradient of the function $\eqref{eq:objectiveFunc}$ taken with
-respect to the current parameters (weights).
-In order to compute the exact gradient we need to perform one pass through all the points in
-a dataset, making the process computationally expensive.
-An alternative is Stochastic Gradient Descent (SGD) where at each iteration we sample one point
-from the complete dataset and update the parameters for each point, in an online manner.
-
-In mini-batch SGD we instead sample random subsets of the dataset, and compute the gradient
-over each batch. At each iteration of the algorithm we update the weights once, based on
-the average of the gradients computed from each mini-batch.
-
-An important parameter is the learning rate $\eta$, or step size, which can be determined by one of five methods, listed below. The setting of the initial step size can significantly affect the performance of the
-algorithm. For some practical tips on tuning SGD see Leon Botou's
-"[Stochastic Gradient Descent Tricks](http://research.microsoft.com/pubs/192769/tricks-2012.pdf)".
-
-The current implementation of SGD  uses the whole partition, making it
-effectively a batch gradient descent. Once a sampling operator has been introduced in Flink, true
-mini-batch SGD will be performed.
-
-### Regularization
-
-FlinkML supports Stochastic Gradient Descent with L1, L2 and no regularization.
-The following list contains a mapping between the implementing classes and the regularization function.
-
-<table class="table table-bordered">
-  <thead>
-    <tr>
-      <th class="text-left" style="width: 20%">Class Name</th>
-      <th class="text-center">Regularization function $R(\wv)$</th>
-    </tr>
-  </thead>
-  <tbody>
-    <tr>
-      <td><code>SimpleGradient</code></td>
-      <td>$R(\wv) = 0$</td>
-    </tr>
-    <tr>
-      <td><code>GradientDescentL1</code></td>
-      <td>$R(\wv) = \norm{\wv}_1$</td>
-    </tr>
-    <tr>
-      <td><code>GradientDescentL2</code></td>
-      <td>$R(\wv) = \frac{1}{2}\norm{\wv}_2^2$</td>
-    </tr>
-  </tbody>
-</table>
-
-### Parameters
-
-  The stochastic gradient descent implementation can be controlled by the following parameters:
-
-   <table class="table table-bordered">
-    <thead>
-      <tr>
-        <th class="text-left" style="width: 20%">Parameter</th>
-        <th class="text-center">Description</th>
-      </tr>
-    </thead>
-    <tbody>
-      <tr>
-        <td><strong>LossFunction</strong></td>
-        <td>
-          <p>
-            The loss function to be optimized. (Default value: <strong>None</strong>)
-          </p>
-        </td>
-      </tr>
-      <tr>
-        <td><strong>RegularizationConstant</strong></td>
-        <td>
-          <p>
-            The amount of regularization to apply. (Default value: <strong>0.1</strong>)
-          </p>
-        </td>
-      </tr>
-      <tr>
-        <td><strong>Iterations</strong></td>
-        <td>
-          <p>
-            The maximum number of iterations. (Default value: <strong>10</strong>)
-          </p>
-        </td>
-      </tr>
-      <tr>
-        <td><strong>LearningRate</strong></td>
-        <td>
-          <p>
-            Initial learning rate for the gradient descent method.
-            This value controls how far the gradient descent method moves in the opposite direction
-            of the gradient.
-            (Default value: <strong>0.1</strong>)
-          </p>
-        </td>
-      </tr>
-      <tr>
-        <td><strong>ConvergenceThreshold</strong></td>
-        <td>
-          <p>
-            When set, iterations stop if the relative change in the value of the objective function $\eqref{eq:objectiveFunc}$ is less than the provided threshold, $\tau$.
-            The convergence criterion is defined as follows: $\left| \frac{f(\wv)_{i-1} - f(\wv)_i}{f(\wv)_{i-1}}\right| < \tau$.
-            (Default value: <strong>None</strong>)
-          </p>
-        </td>
-      </tr>
-      <tr>
-        <td><strong>LearningRateMethod</strong></td>
-        <td>
-          <p>
-            (Default value: <strong>LearningRateMethod.Default</strong>)
-          </p>
-        </td>
-      </tr>
-      <tr>
-        <td><strong>Decay</strong></td>
-        <td>
-          <p>
-            (Default value: <strong>0.0</strong>)
-          </p>
-        </td>
-      </tr>
-    </tbody>
-  </table>
-
-### Loss Function
-
-The loss function which is minimized has to implement the `LossFunction` interface, which defines methods to compute the loss and the gradient of it.
-Either one defines ones own `LossFunction` or one uses the `GenericLossFunction` class which constructs the loss function from an outer loss function and a prediction function.
-An example can be seen here
-
-```Scala
-val lossFunction = GenericLossFunction(SquaredLoss, LinearPrediction)
-```
-
-The full list of supported outer loss functions can be found [here](#partial-loss-function-values).
-The full list of supported prediction functions can be found [here](#prediction-function-values).
-
-#### Partial Loss Function Values ##
-
-  <table class="table table-bordered">
-    <thead>
-      <tr>
-        <th class="text-left" style="width: 20%">Function Name</th>
-        <th class="text-center">Description</th>
-        <th class="text-center">Loss</th>
-        <th class="text-center">Loss Derivative</th>
-      </tr>
-    </thead>
-    <tbody>
-      <tr>
-        <td><strong>SquaredLoss</strong></td>
-        <td>
-          <p>
-            Loss function most commonly used for regression tasks.
-          </p>
-        </td>
-        <td class="text-center">$\frac{1}{2} (\wv^T \cdot \x - y)^2$</td>
-        <td class="text-center">$\wv^T \cdot \x - y$</td>
-      </tr>
-    </tbody>
-  </table>
-
-#### Prediction Function Values ##
-
-  <table class="table table-bordered">
-      <thead>
-        <tr>
-          <th class="text-left" style="width: 20%">Function Name</th>
-          <th class="text-center">Description</th>
-          <th class="text-center">Prediction</th>
-          <th class="text-center">Prediction Gradient</th>
-        </tr>
-      </thead>
-      <tbody>
-        <tr>
-          <td><strong>LinearPrediction</strong></td>
-          <td>
-            <p>
-              The function most commonly used for linear models, such as linear regression and
-              linear classifiers.
-            </p>
-          </td>
-          <td class="text-center">$\x^T \cdot \wv$</td>
-          <td class="text-center">$\x$</td>
-        </tr>
-      </tbody>
-    </table>
-
-#### Effective Learning Rate ##
-
-Where:
-
-- $j$ is the iteration number
-
-- $\eta_j$ is the step size on step $j$
-
-- $\eta_0$ is the initial step size
-
-- $\lambda$ is the regularization constant
-
-- $\tau$ is the decay constant, which causes the learning rate to be a decreasing function of $j$, that is to say as iterations increase, learning rate decreases. The exact rate of decay is function specific, see **Inverse Scaling** and **Wei Xu's Method** (which is an extension of the **Inverse Scaling** method).
-
-<table class="table table-bordered">
-    <thead>
-      <tr>
-        <th class="text-left" style="width: 20%">Function Name</th>
-        <th class="text-center">Description</th>
-        <th class="text-center">Function</th>
-        <th class="text-center">Called As</th>
-      </tr>
-    </thead>
-    <tbody>
-      <tr>
-        <td><strong>Default</strong></td>
-        <td>
-          <p>
-            The function default method used for determining the step size. This is equivalent to the inverse scaling method for $\tau$ = 0.5.  This special case is kept as the default to maintain backwards compatibility.
-          </p>
-        </td>
-        <td class="text-center">$\eta_j = \eta_0/\sqrt{j}$</td>
-        <td class="text-center"><code>LearningRateMethod.Default</code></td>
-      </tr>
-      <tr>
-        <td><strong>Constant</strong></td>
-        <td>
-          <p> 
-            The step size is constant throughout the learning task.
-          </p>
-        </td>
-        <td class="text-center">$\eta_j = \eta_0$</td>
-        <td class="text-center"><code>LearningRateMethod.Constant</code></td>
-      </tr>
-      <tr>
-        <td><strong>Leon Bottou's Method</strong></td>
-        <td>
-          <p>
-            This is the <code>'optimal'</code> method of sklearn. 
-            The optimal initial value $t_0$ has to be provided.
-            Sklearn uses the following heuristic: $t_0 = \max(1.0, L^\prime(-\beta, 1.0) / (\alpha \cdot \beta)$
-            with $\beta = \sqrt{\frac{1}{\sqrt{\alpha}}}$ and $L^\prime(prediction, truth)$ being the derivative of the loss function. 
-          </p>
-        </td>
-        <td class="text-center">$\eta_j = 1 / (\lambda \cdot (t_0 + j -1)) $</td>
-        <td class="text-center"><code>LearningRateMethod.Bottou</code></td>
-      </tr>
-      <tr>
-        <td><strong>Inverse Scaling</strong></td>
-        <td>
-          <p>
-            A very common method for determining the step size.
-          </p>
-        </td>
-        <td class="text-center">$\eta_j = \eta_0 / j^{\tau}$</td>
-        <td class="text-center"><code>LearningRateMethod.InvScaling</code></td>
-      </tr>
-      <tr>
-        <td><strong>Wei Xu's Method</strong></td>
-        <td>
-          <p>
-            Method proposed by Wei Xu in <a href="http://arxiv.org/pdf/1107.2490.pdf">Towards Optimal One Pass Large Scale Learning with
-            Averaged Stochastic Gradient Descent</a>
-          </p>
-        </td>
-        <td class="text-center">$\eta_j = \eta_0 \cdot (1+ \lambda \cdot \eta_0 \cdot j)^{-\tau} $</td>
-        <td class="text-center"><code>LearningRateMethod.Xu</code></td>
-      </tr>
-    </tbody>
-  </table>
-
-### Examples
-
-In the Flink implementation of SGD, given a set of examples in a `DataSet[LabeledVector]` and
-optionally some initial weights, we can use `GradientDescentL1.optimize()` in order to optimize
-the weights for the given data.
-
-The user can provide an initial `DataSet[WeightVector]`,
-which contains one `WeightVector` element, or use the default weights which are all set to 0.
-A `WeightVector` is a container class for the weights, which separates the intercept from the
-weight vector. This allows us to avoid applying regularization to the intercept.
-
-
-
-{% highlight scala %}
-// Create stochastic gradient descent solver
-val sgd = GradientDescentL1()
-  .setLossFunction(SquaredLoss())
-  .setRegularizationConstant(0.2)
-  .setIterations(100)
-  .setLearningRate(0.01)
-  .setLearningRateMethod(LearningRateMethod.Xu(-0.75))
-
-
-// Obtain data
-val trainingDS: DataSet[LabeledVector] = ...
-
-// Optimize the weights, according to the provided data
-val weightDS = sgd.optimize(trainingDS)
-{% endhighlight %}
+The *optimization guide* has been moved. Redirecting to [{{ site.baseurl }}/apis/batch/libs/ml/optimization.html]({{ site.baseurl }}/apis/batch/libs/ml/optimization.html) in 1 second.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/flink/blob/35ec26cd/docs/libs/ml/pipelines.md
----------------------------------------------------------------------
diff --git a/docs/libs/ml/pipelines.md b/docs/libs/ml/pipelines.md
index 429156d..4b0f372 100644
--- a/docs/libs/ml/pipelines.md
+++ b/docs/libs/ml/pipelines.md
@@ -1,10 +1,5 @@
 ---
-mathjax: include
 title: FlinkML - Looking under the hood of pipelines
-# Sub navigation
-sub-nav-group: batch
-sub-nav-parent: flinkml
-sub-nav-title: Pipelines
 ---
 <!--
 Licensed to the Apache Software Foundation (ASF) under one
@@ -25,421 +20,6 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-* This will be replaced by the TOC
-{:toc}
-
-## Introduction
-
-The ability to chain together different transformers and predictors is an important feature for
-any Machine Learning (ML) library. In FlinkML we wanted to provide an intuitive API,
-and at the same
-time utilize the capabilities of the Scala language to provide
-type-safe implementations of our pipelines. What we hope to achieve then is an easy to use API,
-that protects users from type errors at pre-flight (before the job is launched) time, thereby
-eliminating cases where long
-running jobs are submitted to the cluster only to see them fail due to some
-error in the series of data transformations that commonly happen in an ML pipeline.
-
-In this guide then we will describe the choices we made during the implementation of chainable
-transformers and predictors in FlinkML, and provide guidelines on how developers can create their
-own algorithms that make use of these capabilities.
-
-## The what and the why
-
-So what do we mean by "ML pipelines"? Pipelines in the ML context can be thought of as chains of
-operations that have some data as input, perform a number of transformations to that data,
-and
-then output the transformed data, either to be used as the input (features) of a predictor
-function, such as a learning model, or just output the transformed data themselves, to be used in
-some other task. The end learner can of course be a part of the pipeline as well.
-ML pipelines can often be complicated sets of operations ([in-depth explanation](http://research.google.com/pubs/pub43146.html)) and
-can become sources of errors for end-to-end learning systems.
-
-The purpose of ML pipelines is then to create a
-framework that can be used to manage the complexity introduced by these chains of operations.
-Pipelines should make it easy for developers to define chained transformations that can be
-applied to the
-training data, in order to create the end features that will be used to train a
-learning model, and then perform the same set of transformations just as easily to unlabeled
-(test) data. Pipelines should also simplify cross-validation and model selection on
-these chains of operations.
-
-Finally, by ensuring that the consecutive links in the pipeline chain "fit together" we also
-avoid costly type errors. Since each step in a pipeline can be a computationally-heavy operation,
-we want to avoid running a pipelined job, unless we are sure that all the input/output pairs in a
-pipeline "fit".
-
-## Pipelines in FlinkML
-
-The building blocks for pipelines in FlinkML can be found in the `ml.pipeline` package.
-FlinkML follows an API inspired by [sklearn](http://scikit-learn.org) which means that we have
-`Estimator`, `Transformer` and `Predictor` interfaces. For an in-depth look at the design of the
-sklearn API the interested reader is referred to [this](http://arxiv.org/abs/1309.0238) paper.
-In short, the `Estimator` is the base class from which `Transformer` and `Predictor` inherit.
-`Estimator` defines a `fit` method, and `Transformer` also defines a `transform` method and
-`Predictor` defines a `predict` method.
-
-The `fit` method of the `Estimator` performs the actual training of the model, for example
-finding the correct weights in a linear regression task, or the mean and standard deviation of
-the data in a feature scaler.
-As evident by the naming, classes that implement
-`Transformer` are transform operations like [scaling the input](standard_scaler.html) and
-`Predictor` implementations are learning algorithms such as [Multiple Linear Regression]({{site.baseurl}}/libs/ml/multiple_linear_regression.html).
-Pipelines can be created by chaining together a number of Transformers, and the final link in a pipeline can be a Predictor or another Transformer.
-Pipelines that end with Predictor cannot be chained any further.
-Below is an example of how a pipeline can be formed:
-
-{% highlight scala %}
-// Training data
-val input: DataSet[LabeledVector] = ...
-// Test data
-val unlabeled: DataSet[Vector] = ...
-
-val scaler = StandardScaler()
-val polyFeatures = PolynomialFeatures()
-val mlr = MultipleLinearRegression()
-
-// Construct the pipeline
-val pipeline = scaler
-  .chainTransformer(polyFeatures)
-  .chainPredictor(mlr)
-
-// Train the pipeline (scaler and multiple linear regression)
-pipeline.fit(input)
-
-// Calculate predictions for the testing data
-val predictions: DataSet[LabeledVector] = pipeline.predict(unlabeled)
-
-{% endhighlight %}
-
-As we mentioned, FlinkML pipelines are type-safe.
-If we tried to chain a transformer with output of type `A` to another with input of type `B` we
-would get an error at pre-flight time if `A` != `B`. FlinkML achieves this kind of type-safety
-through the use of Scala's implicits.
-
-### Scala implicits
-
-If you are not familiar with Scala's implicits we can recommend [this excerpt](https://www.artima.com/pins1ed/implicit-conversions-and-parameters.html)
-from Martin Odersky's "Programming in Scala". In short, implicit conversions allow for ad-hoc
-polymorphism in Scala by providing conversions from one type to another, and implicit values
-provide the compiler with default values that can be supplied to function calls through implicit parameters.
-The combination of implicit conversions and implicit parameters is what allows us to chain transform
-and predict operations together in a type-safe manner.
-
-### Operations
-
-As we mentioned, the trait (abstract class) `Estimator` defines a `fit` method. The method has two
-parameter lists
-(i.e. is a [curried function](http://docs.scala-lang.org/tutorials/tour/currying.html)). The
-first parameter list
-takes the input (training) `DataSet` and the parameters for the estimator. The second parameter
-list takes one `implicit` parameter, of type `FitOperation`. `FitOperation` is a class that also
-defines a `fit` method, and this is where the actual logic of training the concrete Estimators
-should be implemented. The `fit` method of `Estimator` is essentially a wrapper around the  fit
-method of `FitOperation`. The `predict` method of `Predictor` and the `transform` method of
-`Transform` are designed in a similar manner, with a respective operation class.
-
-In these methods the operation object is provided as an implicit parameter.
-Scala will [look for implicits](http://docs.scala-lang.org/tutorials/FAQ/finding-implicits.html)
-in the companion object of a type, so classes that implement these interfaces should provide these 
-objects as implicit objects inside the companion object.
-
-As an example we can look at the `StandardScaler` class. `StandardScaler` extends `Transformer`, so it has access to its `fit` and `transform` functions.
-These two functions expect objects of `FitOperation` and `TransformOperation` as implicit parameters, 
-for the `fit` and `transform` methods respectively, which `StandardScaler` provides in its companion 
-object, through `transformVectors` and `fitVectorStandardScaler`:
-
-{% highlight scala %}
-class StandardScaler extends Transformer[StandardScaler] {
-  ...
-}
-
-object StandardScaler {
-
-  ...
-
-  implicit def fitVectorStandardScaler[T <: Vector] = new FitOperation[StandardScaler, T] {
-    override def fit(instance: StandardScaler, fitParameters: ParameterMap, input: DataSet[T])
-      : Unit = {
-        ...
-      }
-
-  implicit def transformVectors[T <: Vector: VectorConverter: TypeInformation: ClassTag] = {
-      new TransformOperation[StandardScaler, T, T] {
-        override def transform(
-          instance: StandardScaler,
-          transformParameters: ParameterMap,
-          input: DataSet[T])
-        : DataSet[T] = {
-          ...
-        }
-
-}
-
-{% endhighlight %}
-
-Note that `StandardScaler` does **not** override the `fit` method of `Estimator` or the `transform`
-method of `Transformer`. Rather, its implementations of `FitOperation` and `TransformOperation`
-override their respective `fit` and `transform` methods, which are then called by the `fit` and
-`transform` methods of `Estimator` and `Transformer`.  Similarly, a class that implements
-`Predictor` should define an implicit `PredictOperation` object inside its companion object.
-
-#### Types and type safety
-
-Apart from the `fit` and `transform` operations that we listed above, the `StandardScaler` also
-provides `fit` and `transform` operations for input of type `LabeledVector`.
-This allows us to use the  algorithm for input that is labeled or unlabeled, and this happens
-automatically, depending on  the type of the input that we give to the fit and transform
-operations. The correct implicit operation is chosen by the compiler, depending on the input type.
-
-If we try to call the `fit` or `transform` methods with types that are not supported we will get a 
-runtime error before the job is launched. 
-While it would be possible to catch these kinds of errors at compile time as well, the error 
-messages that we are able to provide the user would be much less informative, which is why we chose 
-to throw runtime exceptions instead.
-
-### Chaining
-
-Chaining is achieved by calling `chainTransformer` or `chainPredictor` on an object
-of a class that implements `Transformer`. These methods return a `ChainedTransformer` or
-`ChainedPredictor` object respectively. As we mentioned, `ChainedTransformer` objects can be
-chained further, while `ChainedPredictor` objects cannot. These classes take care of applying
-fit, transform, and predict operations for a pair of successive transformers or
-a transformer and a predictor. They also act recursively if the length of the
-chain is larger than two, since every `ChainedTransformer` defines a `transform` and `fit`
-operation that can be further chained with more transformers or a predictor.
-
-It is important to note that developers and users do not need to worry about chaining when
-implementing their algorithms, all this is handled automatically by FlinkML.
-
-### How to Implement a Pipeline Operator
-
-In order to support FlinkML's pipelining, algorithms have to adhere to a certain design pattern, which we will describe in this section.
-Let's assume that we want to implement a pipeline operator which changes the mean of your data.
-Since centering data is a common pre-processing step in many analysis pipelines, we will implement it as a `Transformer`.
-Therefore, we first create a `MeanTransformer` class which inherits from `Transformer`
-
-{% highlight scala %}
-class MeanTransformer extends Transformer[MeanTransformer] {}
-{% endhighlight %}
-
-Since we want to be able to configure the mean of the resulting data, we have to add a configuration parameter.
-
-{% highlight scala %}
-class MeanTransformer extends Transformer[MeanTransformer] {
-  def setMean(mean: Double): this.type = {
-    parameters.add(MeanTransformer.Mean, mean)
-    this
-  }
-}
-
-object MeanTransformer {
-  case object Mean extends Parameter[Double] {
-    override val defaultValue: Option[Double] = Some(0.0)
-  }
-  
-  def apply(): MeanTransformer = new MeanTransformer
-}
-{% endhighlight %}
-
-Parameters are defined in the companion object of the transformer class and extend the `Parameter` class.
-Since the parameter instances are supposed to act as immutable keys for a parameter map, they should be implemented as `case objects`.
-The default value will be used if no other value has been set by the user of this component.
-If no default value has been specified, meaning that `defaultValue = None`, then the algorithm has to handle this situation accordingly.
-
-We can now instantiate a `MeanTransformer` object and set the mean value of the transformed data.
-But we still have to implement how the transformation works.
-The workflow can be separated into two phases.
-Within the first phase, the transformer learns the mean of the given training data.
-This knowledge can then be used in the second phase to transform the provided data with respect to the configured resulting mean value.
-
-The learning of the mean can be implemented within the `fit` operation of our `Transformer`, which it inherited from `Estimator`.
-Within the `fit` operation, a pipeline component is trained with respect to the given training data.
-The algorithm is, however, **not** implemented by overriding the `fit` method but by providing an implementation of a corresponding `FitOperation` for the correct type.
-Taking a look at the definition of the `fit` method in `Estimator`, which is the parent class of `Transformer`, reveals what why this is the case.
-
-{% highlight scala %}
-trait Estimator[Self] extends WithParameters with Serializable {
-  that: Self =>
-
-  def fit[Training](
-      training: DataSet[Training],
-      fitParameters: ParameterMap = ParameterMap.Empty)
-      (implicit fitOperation: FitOperation[Self, Training]): Unit = {
-    FlinkMLTools.registerFlinkMLTypes(training.getExecutionEnvironment)
-    fitOperation.fit(this, fitParameters, training)
-  }
-}
-{% endhighlight %}
-
-We see that the `fit` method is called with an input data set of type `Training`, an optional parameter list and in the second parameter list with an implicit parameter of type `FitOperation`.
-Within the body of the function, first some machine learning types are registered and then the `fit` method of the `FitOperation` parameter is called.
-The instance gives itself, the parameter map and the training data set as a parameters to the method.
-Thus, all the program logic takes place within the `FitOperation`.
-
-The `FitOperation` has two type parameters.
-The first defines the pipeline operator type for which this `FitOperation` shall work and the second type parameter defines the type of the data set elements.
-If we first wanted to implement the `MeanTransformer` to work on `DenseVector`, we would, thus, have to provide an implementation for `FitOperation[MeanTransformer, DenseVector]`.
- 
-{% highlight scala %}
-val denseVectorMeanFitOperation = new FitOperation[MeanTransformer, DenseVector] {
-  override def fit(instance: MeanTransformer, fitParameters: ParameterMap, input: DataSet[DenseVector]) : Unit = {
-    import org.apache.flink.ml.math.Breeze._
-    val meanTrainingData: DataSet[DenseVector] = input
-      .map{ x => (x.asBreeze, 1) }
-      .reduce{
-        (left, right) => 
-          (left._1 + right._1, left._2 + right._2) 
-      }
-      .map{ p => (p._1/p._2).fromBreeze }
-  }
-}
-{% endhighlight %}
-
-A `FitOperation[T, I]` has a `fit` method which is called with an instance of type `T`, a parameter map and an input `DataSet[I]`.
-In our case `T=MeanTransformer` and `I=DenseVector`.
-The parameter map is necessary if our fit step depends on some parameter values which were not given directly at creation time of the `Transformer`.
-The `FitOperation` of the `MeanTransformer` sums the `DenseVector` instances of the given input data set up and divides the result by the total number of vectors.
-That way, we obtain a `DataSet[DenseVector]` with a single element which is the mean value.
-
-But if we look closely at the implementation, we see that the result of the mean computation is never stored anywhere.
-If we want to use this knowledge in a later step to adjust the mean of some other input, we have to keep it around.
-And here is where the parameter of type `MeanTransformer` which is given to the `fit` method comes into play.
-We can use this instance to store state, which is used by a subsequent `transform` operation which works on the same object.
-But first we have to extend `MeanTransformer` by a member field and then adjust the `FitOperation` implementation.
-
-{% highlight scala %}
-class MeanTransformer extends Transformer[Centering] {
-  var meanOption: Option[DataSet[DenseVector]] = None
-
-  def setMean(mean: Double): Mean = {
-    parameters.add(MeanTransformer.Mean, mu)
-  }
-}
-
-val denseVectorMeanFitOperation = new FitOperation[MeanTransformer, DenseVector] {
-  override def fit(instance: MeanTransformer, fitParameters: ParameterMap, input: DataSet[DenseVector]) : Unit = {
-    import org.apache.flink.ml.math.Breeze._
-    
-    instance.meanOption = Some(input
-      .map{ x => (x.asBreeze, 1) }
-      .reduce{
-        (left, right) => 
-          (left._1 + right._1, left._2 + right._2) 
-      }
-      .map{ p => (p._1/p._2).fromBreeze })
-  }
-}
-{% endhighlight %}
-
-If we look at the `transform` method in `Transformer`, we will see that we also need an implementation of `TransformOperation`.
-A possible mean transforming implementation could look like the following.
-
-{% highlight scala %}
-
-val denseVectorMeanTransformOperation = new TransformOperation[MeanTransformer, DenseVector, DenseVector] {
-  override def transform(
-      instance: MeanTransformer, 
-      transformParameters: ParameterMap, 
-      input: DataSet[DenseVector]) 
-    : DataSet[DenseVector] = {
-    val resultingParameters = parameters ++ transformParameters
-    
-    val resultingMean = resultingParameters(MeanTransformer.Mean)
-    
-    instance.meanOption match {
-      case Some(trainingMean) => {
-        input.map{ new MeanTransformMapper(resultingMean) }.withBroadcastSet(trainingMean, "trainingMean")
-      }
-      case None => throw new RuntimeException("MeanTransformer has not been fitted to data.")
-    }
-  }
-}
-
-class MeanTransformMapper(resultingMean: Double) extends RichMapFunction[DenseVector, DenseVector] {
-  var trainingMean: DenseVector = null
-
-  override def open(parameters: Configuration): Unit = {
-    trainingMean = getRuntimeContext().getBroadcastVariable[DenseVector]("trainingMean").get(0)
-  }
-  
-  override def map(vector: DenseVector): DenseVector = {
-    import org.apache.flink.ml.math.Breeze._
-    
-    val result = vector.asBreeze - trainingMean.asBreeze + resultingMean
-    
-    result.fromBreeze
-  }
-}
-{% endhighlight %}
-
-Now we have everything implemented to fit our `MeanTransformer` to a training data set of `DenseVector` instances and to transform them.
-However, when we execute the `fit` operation
-
-{% highlight scala %}
-val trainingData: DataSet[DenseVector] = ...
-val meanTransformer = MeanTransformer()
-
-meanTransformer.fit(trainingData)
-{% endhighlight %}
-
-we receive the following error at runtime: `"There is no FitOperation defined for class MeanTransformer which trains on a DataSet[org.apache.flink.ml.math.DenseVector]"`.
-The reason is that the Scala compiler could not find a fitting `FitOperation` value with the right type parameters for the implicit parameter of the `fit` method.
-Therefore, it chose a fallback implicit value which gives you this error message at runtime.
-In order to make the compiler aware of our implementation, we have to define it as an implicit value and put it in the scope of the `MeanTransformer's` companion object.
-
-{% highlight scala %}
-object MeanTransformer{
-  implicit val denseVectorMeanFitOperation = new FitOperation[MeanTransformer, DenseVector] ...
-  
-  implicit val denseVectorMeanTransformOperation = new TransformOperation[MeanTransformer, DenseVector, DenseVector] ...
-}
-{% endhighlight %}
-
-Now we can call `fit` and `transform` of our `MeanTransformer` with `DataSet[DenseVector]` as input.
-Furthermore, we can now use this transformer as part of an analysis pipeline where we have a `DenseVector` as input and expected output.
-
-{% highlight scala %}
-val trainingData: DataSet[DenseVector] = ...
-
-val mean = MeanTransformer.setMean(1.0)
-val polyFeaturs = PolynomialFeatures().setDegree(3)
-
-val pipeline = mean.chainTransformer(polyFeatures)
-
-pipeline.fit(trainingData)
-{% endhighlight %}
-
-It is noteworthy that there is no additional code needed to enable chaining.
-The system automatically constructs the pipeline logic using the operations of the individual components.
-
-So far everything works fine with `DenseVector`.
-But what happens, if we call our transformer with `LabeledVector` instead?
-{% highlight scala %}
-val trainingData: DataSet[LabeledVector] = ...
-
-val mean = MeanTransformer()
-
-mean.fit(trainingData)
-{% endhighlight %}
-
-As before we see the following exception upon execution of the program: `"There is no FitOperation defined for class MeanTransformer which trains on a DataSet[org.apache.flink.ml.common.LabeledVector]"`.
-It is noteworthy, that this exception is thrown in the pre-flight phase, which means that the job has not been submitted to the runtime system.
-This has the advantage that you won't see a job which runs for a couple of days and then fails because of an incompatible pipeline component.
-Type compatibility is, thus, checked at the very beginning for the complete job.
-
-In order to make the `MeanTransformer` work on `LabeledVector` as well, we have to provide the corresponding operations.
-Consequently, we have to define a `FitOperation[MeanTransformer, LabeledVector]` and `TransformOperation[MeanTransformer, LabeledVector, LabeledVector]` as implicit values in the scope of `MeanTransformer`'s companion object.
-
-{% highlight scala %}
-object MeanTransformer {
-  implicit val labeledVectorFitOperation = new FitOperation[MeanTransformer, LabeledVector] ...
-  
-  implicit val labeledVectorTransformOperation = new TransformOperation[MeanTransformer, LabeledVector, LabeledVector] ...
-}
-{% endhighlight %}
-
-If we wanted to implement a `Predictor` instead of a `Transformer`, then we would have to provide a `FitOperation`, too.
-Moreover, a `Predictor` requires a `PredictOperation` which implements how predictions are calculated from testing data.  
-
+<meta http-equiv="refresh" content="1; url={{ site.baseurl }}/apis/batch/libs/ml/pipelines.html" />
 
+The *pipelines guide* has been moved. Redirecting to [{{ site.baseurl }}/apis/batch/libs/ml/pipelines.html]({{ site.baseurl }}/apis/batch/libs/ml/pipelines.html) in 1 second.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/flink/blob/35ec26cd/docs/libs/ml/polynomial_features.md
----------------------------------------------------------------------
diff --git a/docs/libs/ml/polynomial_features.md b/docs/libs/ml/polynomial_features.md
index 27fb1e9..0d4f628 100644
--- a/docs/libs/ml/polynomial_features.md
+++ b/docs/libs/ml/polynomial_features.md
@@ -1,10 +1,5 @@
 ---
-mathjax: include
 title: FlinkML - Polynomial Features
-# Sub navigation
-sub-nav-group: batch
-sub-nav-parent: flinkml
-sub-nav-title: Polynomial Features
 ---
 <!--
 Licensed to the Apache Software Foundation (ASF) under one
@@ -25,87 +20,6 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-* This will be replaced by the TOC
-{:toc}
+<meta http-equiv="refresh" content="1; url={{ site.baseurl }}/apis/batch/libs/ml/polynomial_features.html" />
 
-## Description
-
-The polynomial features transformer maps a vector into the polynomial feature space of degree $d$.
-The dimension of the input vector determines the number of polynomial factors whose values are the respective vector entries.
-Given a vector $(x, y, z, \ldots)^T$ the resulting feature vector looks like:
-
-$$\left(x, y, z, x^2, xy, y^2, yz, z^2, x^3, x^2y, x^2z, xy^2, xyz, xz^2, y^3, \ldots\right)^T$$
-
-Flink's implementation orders the polynomials in decreasing order of their degree.
-
-Given the vector $\left(3,2\right)^T$, the polynomial features vector of degree 3 would look like
- 
- $$\left(3^3, 3^2\cdot2, 3\cdot2^2, 2^3, 3^2, 3\cdot2, 2^2, 3, 2\right)^T$$
-
-This transformer can be prepended to all `Transformer` and `Predictor` implementations which expect an input of type `LabeledVector` or any sub-type of `Vector`.
-
-## Operations
-
-`PolynomialFeatures` is a `Transformer`.
-As such, it supports the `fit` and `transform` operation.
-
-### Fit
-
-PolynomialFeatures is not trained on data and, thus, supports all types of input data.
-
-### Transform
-
-PolynomialFeatures transforms all subtypes of `Vector` and `LabeledVector` into their respective types: 
-
-* `transform[T <: Vector]: DataSet[T] => DataSet[T]`
-* `transform: DataSet[LabeledVector] => DataSet[LabeledVector]`
-
-## Parameters
-
-The polynomial features transformer can be controlled by the following parameters:
-
-<table class="table table-bordered">
-    <thead>
-      <tr>
-        <th class="text-left" style="width: 20%">Parameters</th>
-        <th class="text-center">Description</th>
-      </tr>
-    </thead>
-
-    <tbody>
-      <tr>
-        <td><strong>Degree</strong></td>
-        <td>
-          <p>
-            The maximum polynomial degree. 
-            (Default value: <strong>10</strong>)
-          </p>
-        </td>
-      </tr>
-    </tbody>
-  </table>
-
-## Examples
-
-{% highlight scala %}
-// Obtain the training data set
-val trainingDS: DataSet[LabeledVector] = ...
-
-// Setup polynomial feature transformer of degree 3
-val polyFeatures = PolynomialFeatures()
-.setDegree(3)
-
-// Setup the multiple linear regression learner
-val mlr = MultipleLinearRegression()
-
-// Control the learner via the parameter map
-val parameters = ParameterMap()
-.add(MultipleLinearRegression.Iterations, 20)
-.add(MultipleLinearRegression.Stepsize, 0.5)
-
-// Create pipeline PolynomialFeatures -> MultipleLinearRegression
-val pipeline = polyFeatures.chainPredictor(mlr)
-
-// train the model
-pipeline.fit(trainingDS)
-{% endhighlight %}
+The *polynomial features guide* has been moved. Redirecting to [{{ site.baseurl }}/apis/batch/libs/ml/polynomial_features.html]({{ site.baseurl }}/apis/batch/libs/ml/polynomial_features.html) in 1 second.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/flink/blob/35ec26cd/docs/libs/ml/quickstart.md
----------------------------------------------------------------------
diff --git a/docs/libs/ml/quickstart.md b/docs/libs/ml/quickstart.md
index 60ce73b..88d46a1 100644
--- a/docs/libs/ml/quickstart.md
+++ b/docs/libs/ml/quickstart.md
@@ -1,10 +1,5 @@
 ---
-mathjax: include
 title: FlinkML - Quickstart Guide
-# Sub navigation
-sub-nav-group: batch
-sub-nav-parent: flinkml
-sub-nav-title: Quickstart Guide
 ---
 <!--
 Licensed to the Apache Software Foundation (ASF) under one
@@ -25,220 +20,6 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-* This will be replaced by the TOC
-{:toc}
+<meta http-equiv="refresh" content="1; url={{ site.baseurl }}/apis/batch/libs/ml/quickstart.html" />
 
-## Introduction
-
-FlinkML is designed to make learning from your data a straight-forward process, abstracting away
-the complexities that usually come with big data learning tasks. In this
-quick-start guide we will show just how easy it is to solve a simple supervised learning problem
-using FlinkML. But first some basics, feel free to skip the next few lines if you're already
-familiar with Machine Learning (ML).
-
-As defined by Murphy [[1]](#murphy) ML deals with detecting patterns in data, and using those
-learned patterns to make predictions about the future. We can categorize most ML algorithms into
-two major categories: Supervised and Unsupervised Learning.
-
-* **Supervised Learning** deals with learning a function (mapping) from a set of inputs
-(features) to a set of outputs. The learning is done using a *training set* of (input,
-output) pairs that we use to approximate the mapping function. Supervised learning problems are
-further divided into classification and regression problems. In classification problems we try to
-predict the *class* that an example belongs to, for example whether a user is going to click on
-an ad or not. Regression problems one the other hand, are about predicting (real) numerical
-values, often called the dependent variable, for example what the temperature will be tomorrow.
-
-* **Unsupervised Learning** deals with discovering patterns and regularities in the data. An example
-of this would be *clustering*, where we try to discover groupings of the data from the
-descriptive features. Unsupervised learning can also be used for feature selection, for example
-through [principal components analysis](https://en.wikipedia.org/wiki/Principal_component_analysis).
-
-## Linking with FlinkML
-
-In order to use FlinkML in your project, first you have to
-[set up a Flink program](http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#linking-with-flink).
-Next, you have to add the FlinkML dependency to the `pom.xml` of your project:
-
-{% highlight xml %}
-<dependency>
-  <groupId>org.apache.flink</groupId>
-  <artifactId>flink-ml{{ site.scala_version_suffix }}</artifactId>
-  <version>{{site.version }}</version>
-</dependency>
-{% endhighlight %}
-
-## Loading data
-
-To load data to be used with FlinkML we can use the ETL capabilities of Flink, or specialized
-functions for formatted data, such as the LibSVM format. For supervised learning problems it is
-common to use the `LabeledVector` class to represent the `(label, features)` examples. A `LabeledVector`
-object will have a FlinkML `Vector` member representing the features of the example and a `Double`
-member which represents the label, which could be the class in a classification problem, or the dependent
-variable for a regression problem.
-
-As an example, we can use Haberman's Survival Data Set , which you can
-[download from the UCI ML repository](http://archive.ics.uci.edu/ml/machine-learning-databases/haberman/haberman.data).
-This dataset *"contains cases from a study conducted on the survival of patients who had undergone
-surgery for breast cancer"*. The data comes in a comma-separated file, where the first 3 columns
-are the features and last column is the class, and the 4th column indicates whether the patient
-survived 5 years or longer (label 1), or died within 5 years (label 2). You can check the [UCI
-page](https://archive.ics.uci.edu/ml/datasets/Haberman%27s+Survival) for more information on the data.
-
-We can load the data as a `DataSet[String]` first:
-
-{% highlight scala %}
-
-import org.apache.flink.api.scala.ExecutionEnvironment
-
-val env = ExecutionEnvironment.getExecutionEnvironment
-
-val survival = env.readCsvFile[(String, String, String, String)]("/path/to/haberman.data")
-
-{% endhighlight %}
-
-We can now transform the data into a `DataSet[LabeledVector]`. This will allow us to use the
-dataset with the FlinkML classification algorithms. We know that the 4th element of the dataset
-is the class label, and the rest are features, so we can build `LabeledVector` elements like this:
-
-{% highlight scala %}
-
-import org.apache.flink.ml.common.LabeledVector
-import org.apache.flink.ml.math.DenseVector
-
-val survivalLV = survival
-  .map{tuple =>
-    val list = tuple.productIterator.toList
-    val numList = list.map(_.asInstanceOf[String].toDouble)
-    LabeledVector(numList(3), DenseVector(numList.take(3).toArray))
-  }
-
-{% endhighlight %}
-
-We can then use this data to train a learner. We will however use another dataset to exemplify
-building a learner; that will allow us to show how we can import other dataset formats.
-
-**LibSVM files**
-
-A common format for ML datasets is the LibSVM format and a number of datasets using that format can be
-found [in the LibSVM datasets website](http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/). FlinkML provides utilities for loading
-datasets using the LibSVM format through the `readLibSVM` function available through the `MLUtils`
-object.
-You can also save datasets in the LibSVM format using the `writeLibSVM` function.
-Let's import the svmguide1 dataset. You can download the
-[training set here](http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/svmguide1)
-and the [test set here](http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/svmguide1.t).
-This is an astroparticle binary classification dataset, used by Hsu et al. [[3]](#hsu) in their 
-practical Support Vector Machine (SVM) guide. It contains 4 numerical features, and the class label.
-
-We can simply import the dataset then using:
-
-{% highlight scala %}
-
-import org.apache.flink.ml.MLUtils
-
-val astroTrain: DataSet[LabeledVector] = MLUtils.readLibSVM("/path/to/svmguide1")
-val astroTest: DataSet[LabeledVector] = MLUtils.readLibSVM("/path/to/svmguide1.t")
-
-{% endhighlight %}
-
-This gives us two `DataSet[LabeledVector]` objects that we will use in the following section to
-create a classifier.
-
-## Classification
-
-Once we have imported the dataset we can train a `Predictor` such as a linear SVM classifier.
-We can set a number of parameters for the classifier. Here we set the `Blocks` parameter,
-which is used to split the input by the underlying CoCoA algorithm [[2]](#jaggi) uses. The 
-regularization parameter determines the amount of $l_2$ regularization applied, which is used
-to avoid overfitting. The step size determines the contribution of the weight vector updates to
-the next weight vector value. This parameter sets the initial step size.
-
-{% highlight scala %}
-
-import org.apache.flink.ml.classification.SVM
-
-val svm = SVM()
-  .setBlocks(env.getParallelism)
-  .setIterations(100)
-  .setRegularization(0.001)
-  .setStepsize(0.1)
-  .setSeed(42)
-
-svm.fit(astroTrain)
-
-{% endhighlight %}
-
-We can now make predictions on the test set.
-
-{% highlight scala %}
-
-val predictionPairs = svm.predict(astroTest)
-
-{% endhighlight %}
-
-Next we will see how we can pre-process our data, and use the ML pipelines capabilities of FlinkML.
-
-## Data pre-processing and pipelines
-
-A pre-processing step that is often encouraged [[3]](#hsu) when using SVM classification is scaling
-the input features to the [0, 1] range, in order to avoid features with extreme values
-dominating the rest.
-FlinkML has a number of `Transformers` such as `MinMaxScaler` that are used to pre-process data,
-and a key feature is the ability to chain `Transformers` and `Predictors` together. This allows
-us to run the same pipeline of transformations and make predictions on the train and test data in
-a straight-forward and type-safe manner. You can read more on the pipeline system of FlinkML
-[in the pipelines documentation](pipelines.html).
-
-Let us first create a normalizing transformer for the features in our dataset, and chain it to a
-new SVM classifier.
-
-{% highlight scala %}
-
-import org.apache.flink.ml.preprocessing.MinMaxScaler
-
-val scaler = MinMaxScaler()
-
-val scaledSVM = scaler.chainPredictor(svm)
-
-{% endhighlight %}
-
-We can now use our newly created pipeline to make predictions on the test set.
-First we call fit again, to train the scaler and the SVM classifier.
-The data of the test set will then be automatically scaled before being passed on to the SVM to
-make predictions.
-
-{% highlight scala %}
-
-scaledSVM.fit(astroTrain)
-
-val predictionPairsScaled: DataSet[(Double, Double)] = scaledSVM.predict(astroTest)
-
-{% endhighlight %}
-
-The scaled inputs should give us better prediction performance.
-The result of the prediction on `LabeledVector`s is a data set of tuples where the first entry denotes the true label value and the second entry is the predicted label value.
-
-## Where to go from here
-
-This quickstart guide can act as an introduction to the basic concepts of FlinkML, but there's a lot
-more you can do.
-We recommend going through the [FlinkML documentation](index.html), and trying out the different
-algorithms.
-A very good way to get started is to play around with interesting datasets from the UCI ML
-repository and the LibSVM datasets.
-Tackling an interesting problem from a website like [Kaggle](https://www.kaggle.com) or
-[DrivenData](http://www.drivendata.org/) is also a great way to learn by competing with other
-data scientists.
-If you would like to contribute some new algorithms take a look at our
-[contribution guide](contribution_guide.html).
-
-**References**
-
-<a name="murphy"></a>[1] Murphy, Kevin P. *Machine learning: a probabilistic perspective.* MIT 
-press, 2012.
-
-<a name="jaggi"></a>[2] Jaggi, Martin, et al. *Communication-efficient distributed dual 
-coordinate ascent.* Advances in Neural Information Processing Systems. 2014.
-
-<a name="hsu"></a>[3] Hsu, Chih-Wei, Chih-Chung Chang, and Chih-Jen Lin.
- *A practical guide to support vector classification.* 2003.
+The *quickstart guide* has been moved. Redirecting to [{{ site.baseurl }}/apis/batch/libs/ml/quickstart.html]({{ site.baseurl }}/apis/batch/libs/ml/quickstart.html) in 1 second.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/flink/blob/35ec26cd/docs/libs/ml/standard_scaler.md
----------------------------------------------------------------------
diff --git a/docs/libs/ml/standard_scaler.md b/docs/libs/ml/standard_scaler.md
index f6d7b62..dde3818 100644
--- a/docs/libs/ml/standard_scaler.md
+++ b/docs/libs/ml/standard_scaler.md
@@ -1,10 +1,5 @@
 ---
-mathjax: include
 title: FlinkML - Standard Scaler
-# Sub navigation
-sub-nav-group: batch
-sub-nav-parent: flinkml
-sub-nav-title: Standard Scaler
 ---
 <!--
 Licensed to the Apache Software Foundation (ASF) under one
@@ -25,92 +20,6 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-* This will be replaced by the TOC
-{:toc}
+<meta http-equiv="refresh" content="1; url={{ site.baseurl }}/apis/batch/libs/ml/standard_scaler.html" />
 
-## Description
-
- The standard scaler scales the given data set, so that all features will have a user specified mean and variance. 
- In case the user does not provide a specific mean and standard deviation, the standard scaler transforms the features of the input data set to have mean equal to 0 and standard deviation equal to 1.
- Given a set of input data $x_1, x_2,... x_n$, with mean: 
- 
- $$\bar{x} = \frac{1}{n}\sum_{i=1}^{n}x_{i}$$ 
- 
- and standard deviation: 
- 
- $$\sigma_{x}=\sqrt{ \frac{1}{n} \sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}$$
-
-The scaled data set $z_1, z_2,...,z_n$ will be:
-
- $$z_{i}= std \left (\frac{x_{i} - \bar{x}  }{\sigma_{x}}\right ) + mean$$
-
-where $\textit{std}$ and $\textit{mean}$ are the user specified values for the standard deviation and mean.
-
-## Operations
-
-`StandardScaler` is a `Transformer`.
-As such, it supports the `fit` and `transform` operation.
-
-### Fit
-
-StandardScaler is trained on all subtypes of `Vector` or `LabeledVector`: 
-
-* `fit[T <: Vector]: DataSet[T] => Unit` 
-* `fit: DataSet[LabeledVector] => Unit`
-
-### Transform
-
-StandardScaler transforms all subtypes of `Vector` or `LabeledVector` into the respective type: 
-
-* `transform[T <: Vector]: DataSet[T] => DataSet[T]` 
-* `transform: DataSet[LabeledVector] => DataSet[LabeledVector]`
-
-## Parameters
-
-The standard scaler implementation can be controlled by the following two parameters:
-
- <table class="table table-bordered">
-  <thead>
-    <tr>
-      <th class="text-left" style="width: 20%">Parameters</th>
-      <th class="text-center">Description</th>
-    </tr>
-  </thead>
-
-  <tbody>
-    <tr>
-      <td><strong>Mean</strong></td>
-      <td>
-        <p>
-          The mean of the scaled data set. (Default value: <strong>0.0</strong>)
-        </p>
-      </td>
-    </tr>
-    <tr>
-      <td><strong>Std</strong></td>
-      <td>
-        <p>
-          The standard deviation of the scaled data set. (Default value: <strong>1.0</strong>)
-        </p>
-      </td>
-    </tr>
-  </tbody>
-</table>
-
-## Examples
-
-{% highlight scala %}
-// Create standard scaler transformer
-val scaler = StandardScaler()
-.setMean(10.0)
-.setStd(2.0)
-
-// Obtain data set to be scaled
-val dataSet: DataSet[Vector] = ...
-
-// Learn the mean and standard deviation of the training data
-scaler.fit(dataSet)
-
-// Scale the provided data set to have mean=10.0 and std=2.0
-val scaledDS = scaler.transform(dataSet)
-{% endhighlight %}
+The *standard scaler guide* has been moved. Redirecting to [{{ site.baseurl }}/apis/batch/libs/ml/standard_scaler.html]({{ site.baseurl }}/apis/batch/libs/ml/standard_scaler.html) in 1 second.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/flink/blob/35ec26cd/docs/libs/ml/svm.md
----------------------------------------------------------------------
diff --git a/docs/libs/ml/svm.md b/docs/libs/ml/svm.md
index b149d31..7b75aa0 100644
--- a/docs/libs/ml/svm.md
+++ b/docs/libs/ml/svm.md
@@ -1,10 +1,5 @@
 ---
-mathjax: include
 title: FlinkML - SVM using CoCoA
-# Sub navigation
-sub-nav-group: batch
-sub-nav-parent: flinkml
-sub-nav-title: SVM (CoCoA)
 ---
 <!--
 Licensed to the Apache Software Foundation (ASF) under one
@@ -25,199 +20,6 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-* This will be replaced by the TOC
-{:toc}
+<meta http-equiv="refresh" content="1; url={{ site.baseurl }}/apis/batch/libs/ml/svm.html" />
 
-## Description
-
-Implements an SVM with soft-margin using the communication-efficient distributed dual coordinate
-ascent algorithm with hinge-loss function.
-The algorithm solves the following minimization problem:
-
-$$\min_{\mathbf{w} \in \mathbb{R}^d} \frac{\lambda}{2} \left\lVert \mathbf{w} \right\rVert^2 + \frac{1}{n} \sum_{i=1}^n l_{i}\left(\mathbf{w}^T\mathbf{x}_i\right)$$
-
-with $\mathbf{w}$ being the weight vector, $\lambda$ being the regularization constant,
-$$\mathbf{x}_i \in \mathbb{R}^d$$ being the data points and $$l_{i}$$ being the convex loss
-functions, which can also depend on the labels $$y_{i} \in \mathbb{R}$$.
-In the current implementation the regularizer is the $\ell_2$-norm and the loss functions are the hinge-loss functions:
-
-  $$l_{i} = \max\left(0, 1 - y_{i} \mathbf{w}^T\mathbf{x}_i \right)$$
-
-With these choices, the problem definition is equivalent to a SVM with soft-margin.
-Thus, the algorithm allows us to train a SVM with soft-margin.
-
-The minimization problem is solved by applying stochastic dual coordinate ascent (SDCA).
-In order to make the algorithm efficient in a distributed setting, the CoCoA algorithm calculates
-several iterations of SDCA locally on a data block before merging the local updates into a
-valid global state.
-This state is redistributed to the different data partitions where the next round of local SDCA
-iterations is then executed.
-The number of outer iterations and local SDCA iterations control the overall network costs, because
-there is only network communication required for each outer iteration.
-The local SDCA iterations are embarrassingly parallel once the individual data partitions have been
-distributed across the cluster.
-
-The implementation of this algorithm is based on the work of
-[Jaggi et al.](http://arxiv.org/abs/1409.1458)
-
-## Operations
-
-`SVM` is a `Predictor`.
-As such, it supports the `fit` and `predict` operation.
-
-### Fit
-
-SVM is trained given a set of `LabeledVector`:
-
-* `fit: DataSet[LabeledVector] => Unit`
-
-### Predict
-
-SVM predicts for all subtypes of FlinkML's `Vector` the corresponding class label:
-
-* `predict[T <: Vector]: DataSet[T] => DataSet[(T, Double)]`, where the `(T, Double)` tuple
-  corresponds to (original_features, label)
-
-If we call evaluate with a `DataSet[(Vector, Double)]`, we make a prediction on the class label
-for each example, and return a `DataSet[(Double, Double)]`. In each tuple the first element
-is the true value, as was provided from the input `DataSet[(Vector, Double)]` and the second element
-is the predicted value. You can then use these `(truth, prediction)` tuples to evaluate
-the algorithm's performance.
-
-* `predict: DataSet[(Vector, Double)] => DataSet[(Double, Double)]`
-
-## Parameters
-
-The SVM implementation can be controlled by the following parameters:
-
-<table class="table table-bordered">
-<thead>
-  <tr>
-    <th class="text-left" style="width: 20%">Parameters</th>
-    <th class="text-center">Description</th>
-  </tr>
-</thead>
-
-<tbody>
-  <tr>
-    <td><strong>Blocks</strong></td>
-    <td>
-      <p>
-        Sets the number of blocks into which the input data will be split.
-        On each block the local stochastic dual coordinate ascent method is executed.
-        This number should be set at least to the degree of parallelism.
-        If no value is specified, then the parallelism of the input DataSet is used as the number of blocks.
-        (Default value: <strong>None</strong>)
-      </p>
-    </td>
-  </tr>
-  <tr>
-    <td><strong>Iterations</strong></td>
-    <td>
-      <p>
-        Defines the maximum number of iterations of the outer loop method.
-        In other words, it defines how often the SDCA method is applied to the blocked data.
-        After each iteration, the locally computed weight vector updates have to be reduced to update the global weight vector value.
-        The new weight vector is broadcast to all SDCA tasks at the beginning of each iteration.
-        (Default value: <strong>10</strong>)
-      </p>
-    </td>
-  </tr>
-  <tr>
-    <td><strong>LocalIterations</strong></td>
-    <td>
-      <p>
-        Defines the maximum number of SDCA iterations.
-        In other words, it defines how many data points are drawn from each local data block to calculate the stochastic dual coordinate ascent.
-        (Default value: <strong>10</strong>)
-      </p>
-    </td>
-  </tr>
-  <tr>
-    <td><strong>Regularization</strong></td>
-    <td>
-      <p>
-        Defines the regularization constant of the SVM algorithm.
-        The higher the value, the smaller will the 2-norm of the weight vector be.
-        In case of a SVM with hinge loss this means that the SVM margin will be wider even though it might contain some false classifications.
-        (Default value: <strong>1.0</strong>)
-      </p>
-    </td>
-  </tr>
-  <tr>
-    <td><strong>Stepsize</strong></td>
-    <td>
-      <p>
-        Defines the initial step size for the updates of the weight vector.
-        The larger the step size is, the larger will be the contribution of the weight vector updates to the next weight vector value.
-        The effective scaling of the updates is $\frac{stepsize}{blocks}$.
-        This value has to be tuned in case that the algorithm becomes unstable.
-        (Default value: <strong>1.0</strong>)
-      </p>
-    </td>
-  </tr>
-  <tr>
-    <td><strong>ThresholdValue</strong></td>
-    <td>
-      <p>
-        Defines the limiting value for the decision function above which examples are labeled as
-        positive (+1.0). Examples with a decision function value below this value are classified
-        as negative (-1.0). In order to get the raw decision function values you need to indicate it by
-        using the OutputDecisionFunction parameter.  (Default value: <strong>0.0</strong>)
-      </p>
-    </td>
-  </tr>
-  <tr>
-    <td><strong>OutputDecisionFunction</strong></td>
-    <td>
-      <p>
-        Determines whether the predict and evaluate functions of the SVM should return the distance
-        to the separating hyperplane, or binary class labels. Setting this to true will 
-        return the raw distance to the hyperplane for each example. Setting it to false will 
-        return the binary class label (+1.0, -1.0) (Default value: <strong>false</strong>)
-      </p>
-    </td>
-  </tr>
-  <tr>
-  <td><strong>Seed</strong></td>
-  <td>
-    <p>
-      Defines the seed to initialize the random number generator.
-      The seed directly controls which data points are chosen for the SDCA method.
-      (Default value: <strong>Random Long Integer</strong>)
-    </p>
-  </td>
-</tr>
-</tbody>
-</table>
-
-## Examples
-
-{% highlight scala %}
-import org.apache.flink.api.scala._
-import org.apache.flink.ml.math.Vector
-import org.apache.flink.ml.common.LabeledVector
-import org.apache.flink.ml.classification.SVM
-import org.apache.flink.ml.RichExecutionEnvironment
-
-val pathToTrainingFile: String = ???
-val pathToTestingFile: String = ???
-val env = ExecutionEnvironment.getExecutionEnvironment
-
-// Read the training data set, from a LibSVM formatted file
-val trainingDS: DataSet[LabeledVector] = env.readLibSVM(pathToTrainingFile)
-
-// Create the SVM learner
-val svm = SVM()
-  .setBlocks(10)
-
-// Learn the SVM model
-svm.fit(trainingDS)
-
-// Read the testing data set
-val testingDS: DataSet[Vector] = env.readLibSVM(pathToTestingFile).map(_.vector)
-
-// Calculate the predictions for the testing data set
-val predictionDS: DataSet[(Vector, Double)] = svm.predict(testingDS)
-
-{% endhighlight %}
+The *SVM guide* has been moved. Redirecting to [{{ site.baseurl }}/apis/batch/libs/ml/svm.html]({{ site.baseurl }}/apis/batch/libs/ml/svm.html) in 1 second.
\ No newline at end of file


Mime
View raw message