Repository: spark
Updated Branches:
refs/heads/master 1826372d0 > 27ab0b8a0
[SPARK4711] [mllib] [docs] Programming guide advice on choosing optimizer
I have heard requests for the docs to include advice about choosing an optimization method.
The programming guide could include a brief statement about this (so the user does not have
to read the whole optimization section).
CC: mengxr
Author: Joseph K. Bradley <joseph@databricks.com>
Closes #3569 from jkbradley/lrdoc and squashes the following commits:
654aeb5 [Joseph K. Bradley] updated section header for mlliboptimization
5035ad0 [Joseph K. Bradley] updated based on review
94f6dec [Joseph K. Bradley] Updated linear methods and optimization docs with quick advice
on choosing an optimization method
Project: http://gitwipus.apache.org/repos/asf/spark/repo
Commit: http://gitwipus.apache.org/repos/asf/spark/commit/27ab0b8a
Tree: http://gitwipus.apache.org/repos/asf/spark/tree/27ab0b8a
Diff: http://gitwipus.apache.org/repos/asf/spark/diff/27ab0b8a
Branch: refs/heads/master
Commit: 27ab0b8a03b711e8d86b6167df833f012205ccc7
Parents: 1826372
Author: Joseph K. Bradley <joseph@databricks.com>
Authored: Thu Dec 4 08:58:03 2014 +0800
Committer: Xiangrui Meng <meng@databricks.com>
Committed: Thu Dec 4 08:58:03 2014 +0800

docs/mlliblinearmethods.md  10 +++++++
docs/mlliboptimization.md  17 +++++++++++
2 files changed, 18 insertions(+), 9 deletions()

http://gitwipus.apache.org/repos/asf/spark/blob/27ab0b8a/docs/mlliblinearmethods.md

diff git a/docs/mlliblinearmethods.md b/docs/mlliblinearmethods.md
index bc914a1..44b7f67 100644
 a/docs/mlliblinearmethods.md
+++ b/docs/mlliblinearmethods.md
@@ 110,12 +110,16 @@ However, L1 regularization can help promote sparsity in weights leading
to small
It is not recommended to train models without any regularization,
especially when the number of training examples is small.
+### Optimization
+
+Under the hood, linear methods use convex optimization methods to optimize the objective
functions. MLlib uses two methods, SGD and LBFGS, described in the [optimization section](mlliboptimization.html).
Currently, most algorithm APIs support Stochastic Gradient Descent (SGD), and a few support
LBFGS. Refer to [this optimization section](mlliboptimization.html#ChoosinganOptimizationMethod)
for guidelines on choosing between optimization methods.
+
## Binary classification
[Binary classification](http://en.wikipedia.org/wiki/Binary_classification)
aims to divide items into two categories: positive and negative. MLlib
supports two linear methods for binary classification: linear support vector
machines (SVMs) and logistic regression. For both methods, MLlib supports
+supports two linear methods for binary classification: linear Support Vector
+Machines (SVMs) and logistic regression. For both methods, MLlib supports
L1 and L2 regularized variants. The training data set is represented by an RDD
of [LabeledPoint](mllibdatatypes.html) in MLlib. Note that, in the
mathematical formulation in this guide, a training label $y$ is denoted as
@@ 123,7 +127,7 @@ either $+1$ (positive) or $1$ (negative), which is convenient for the
formulation. *However*, the negative label is represented by $0$ in MLlib
instead of $1$, to be consistent with multiclass labeling.
### Linear support vector machines (SVMs)
+### Linear Support Vector Machines (SVMs)
The [linear SVM](http://en.wikipedia.org/wiki/Support_vector_machine#Linear_SVM)
is a standard method for largescale classification tasks. It is a linear method as described
above in equation `$\eqref{eq:regPrimal}$`, with the loss function in the formulation given
by the hinge loss:
http://gitwipus.apache.org/repos/asf/spark/blob/27ab0b8a/docs/mlliboptimization.md

diff git a/docs/mlliboptimization.md b/docs/mlliboptimization.md
index 45141c2..4d101af 100644
 a/docs/mlliboptimization.md
+++ b/docs/mlliboptimization.md
@@ 138,6 +138,12 @@ vertical scalability issue (the number of training features) when computing
the
explicitly in Newton's method. As a result, LBFGS often achieves rapider convergence compared
with
other firstorder optimization.
+### Choosing an Optimization Method
+
+[Linear methods](mlliblinearmethods.html) use optimization internally, and some linear
methods in MLlib support both SGD and LBFGS.
+Different optimization methods can have different convergence guarantees depending on the
properties of the objective function, and we cannot cover the literature here.
+In general, when LBFGS is available, we recommend using it instead of SGD since LBFGS tends
to converge faster (in fewer iterations).
+
## Implementation in MLlib
### Gradient descent and stochastic gradient descent
@@ 168,10 +174,7 @@ descent. All updaters in MLlib use a step size at the tth step equal
to
* `regParam` is the regularization parameter when using L1 or L2 regularization.
* `miniBatchFraction` is the fraction of the total data that is sampled in
each iteration, to compute the gradient direction.

Available algorithms for gradient descent:

* [GradientDescent](api/scala/index.html#org.apache.spark.mllib.optimization.GradientDescent)
+ * Sampling still requires a pass over the entire RDD, so decreasing `miniBatchFraction`
may not speed up optimization much. Users will see the greatest speedup when the gradient
is expensive to compute, for only the chosen samples are used for computing the gradient.
### LBFGS
LBFGS is currently only a lowlevel optimization primitive in `MLlib`. If you want to use
LBFGS in various
@@ 359,13 +362,15 @@ public class LBFGSExample {
{% endhighlight %}
</div>
</div>
#### Developer's note
+
+## Developer's notes
+
Since the Hessian is constructed approximately from previous gradient evaluations,
the objective function can not be changed during the optimization process.
As a result, Stochastic LBFGS will not work naively by just using miniBatch;
therefore, we don't provide this until we have better understanding.
* `Updater` is a class originally designed for gradient decent which computes
+`Updater` is a class originally designed for gradient decent which computes
the actual gradient descent step. However, we're able to take the gradient and
loss of objective function of regularization for LBFGS by ignoring the part of logic
only for gradient decent such as adaptive step size stuff. We will refactorize

To unsubscribe, email: commitsunsubscribe@spark.apache.org
For additional commands, email: commitshelp@spark.apache.org
