spark-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject spark git commit: [SPARK-4711] [mllib] [docs] Programming guide advice on choosing optimizer
Date Thu, 04 Dec 2014 00:58:21 GMT
Repository: spark
Updated Branches:
  refs/heads/master 1826372d0 -> 27ab0b8a0

[SPARK-4711] [mllib] [docs] Programming guide advice on choosing optimizer

I have heard requests for the docs to include advice about choosing an optimization method.
The programming guide could include a brief statement about this (so the user does not have
to read the whole optimization section).

CC: mengxr

Author: Joseph K. Bradley <>

Closes #3569 from jkbradley/lr-doc and squashes the following commits:

654aeb5 [Joseph K. Bradley] updated section header for mllib-optimization
5035ad0 [Joseph K. Bradley] updated based on review
94f6dec [Joseph K. Bradley] Updated linear methods and optimization docs with quick advice
on choosing an optimization method


Branch: refs/heads/master
Commit: 27ab0b8a03b711e8d86b6167df833f012205ccc7
Parents: 1826372
Author: Joseph K. Bradley <>
Authored: Thu Dec 4 08:58:03 2014 +0800
Committer: Xiangrui Meng <>
Committed: Thu Dec 4 08:58:03 2014 +0800

 docs/ | 10 +++++++---
 docs/   | 17 +++++++++++------
 2 files changed, 18 insertions(+), 9 deletions(-)
diff --git a/docs/ b/docs/
index bc914a1..44b7f67 100644
--- a/docs/
+++ b/docs/
@@ -110,12 +110,16 @@ However, L1 regularization can help promote sparsity in weights leading
to small
 It is not recommended to train models without any regularization,
 especially when the number of training examples is small.
+### Optimization
+Under the hood, linear methods use convex optimization methods to optimize the objective
functions.  MLlib uses two methods, SGD and L-BFGS, described in the [optimization section](mllib-optimization.html).
 Currently, most algorithm APIs support Stochastic Gradient Descent (SGD), and a few support
L-BFGS. Refer to [this optimization section](mllib-optimization.html#Choosing-an-Optimization-Method)
for guidelines on choosing between optimization methods.
 ## Binary classification
 [Binary classification](
 aims to divide items into two categories: positive and negative.  MLlib
-supports two linear methods for binary classification: linear support vector
-machines (SVMs) and logistic regression. For both methods, MLlib supports
+supports two linear methods for binary classification: linear Support Vector
+Machines (SVMs) and logistic regression. For both methods, MLlib supports
 L1 and L2 regularized variants. The training data set is represented by an RDD
 of [LabeledPoint](mllib-data-types.html) in MLlib.  Note that, in the
 mathematical formulation in this guide, a training label $y$ is denoted as
@@ -123,7 +127,7 @@ either $+1$ (positive) or $-1$ (negative), which is convenient for the
 formulation.  *However*, the negative label is represented by $0$ in MLlib
 instead of $-1$, to be consistent with multiclass labeling.
-### Linear support vector machines (SVMs)
+### Linear Support Vector Machines (SVMs)
 The [linear SVM](
 is a standard method for large-scale classification tasks. It is a linear method as described
above in equation `$\eqref{eq:regPrimal}$`, with the loss function in the formulation given
by the hinge loss:
diff --git a/docs/ b/docs/
index 45141c2..4d101af 100644
--- a/docs/
+++ b/docs/
@@ -138,6 +138,12 @@ vertical scalability issue (the number of training features) when computing
 explicitly in Newton's method. As a result, L-BFGS often achieves rapider convergence compared
 other first-order optimization. 
+### Choosing an Optimization Method
+[Linear methods](mllib-linear-methods.html) use optimization internally, and some linear
methods in MLlib support both SGD and L-BFGS.
+Different optimization methods can have different convergence guarantees depending on the
properties of the objective function, and we cannot cover the literature here.
+In general, when L-BFGS is available, we recommend using it instead of SGD since L-BFGS tends
to converge faster (in fewer iterations).
 ## Implementation in MLlib
 ### Gradient descent and stochastic gradient descent
@@ -168,10 +174,7 @@ descent. All updaters in MLlib use a step size at the t-th step equal
 * `regParam` is the regularization parameter when using L1 or L2 regularization.
 * `miniBatchFraction` is the fraction of the total data that is sampled in 
 each iteration, to compute the gradient direction.
-Available algorithms for gradient descent:
-* [GradientDescent](api/scala/index.html#org.apache.spark.mllib.optimization.GradientDescent)
+  * Sampling still requires a pass over the entire RDD, so decreasing `miniBatchFraction`
may not speed up optimization much.  Users will see the greatest speedup when the gradient
is expensive to compute, for only the chosen samples are used for computing the gradient.
 ### L-BFGS
 L-BFGS is currently only a low-level optimization primitive in `MLlib`. If you want to use
L-BFGS in various 
@@ -359,13 +362,15 @@ public class LBFGSExample {
 {% endhighlight %}
-#### Developer's note
+## Developer's notes
 Since the Hessian is constructed approximately from previous gradient evaluations, 
 the objective function can not be changed during the optimization process. 
 As a result, Stochastic L-BFGS will not work naively by just using miniBatch; 
 therefore, we don't provide this until we have better understanding.
-* `Updater` is a class originally designed for gradient decent which computes 
+`Updater` is a class originally designed for gradient decent which computes 
 the actual gradient descent step. However, we're able to take the gradient and 
 loss of objective function of regularization for L-BFGS by ignoring the part of logic
 only for gradient decent such as adaptive step size stuff. We will refactorize

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message