# spark-commits mailing list archives

##### Site index · List index
Message view
Top
From m...@apache.org
Subject spark git commit: [SPARK-4711] [mllib] [docs] Programming guide advice on choosing optimizer
Date Thu, 04 Dec 2014 00:58:21 GMT
Repository: spark
Updated Branches:

[SPARK-4711] [mllib] [docs] Programming guide advice on choosing optimizer

I have heard requests for the docs to include advice about choosing an optimization method.
to read the whole optimization section).

CC: mengxr

Closes #3569 from jkbradley/lr-doc and squashes the following commits:

94f6dec [Joseph K. Bradley] Updated linear methods and optimization docs with quick advice
on choosing an optimization method

Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/27ab0b8a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/27ab0b8a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/27ab0b8a

Commit: 27ab0b8a03b711e8d86b6167df833f012205ccc7
Parents: 1826372
Authored: Thu Dec 4 08:58:03 2014 +0800
Committer: Xiangrui Meng <meng@databricks.com>
Committed: Thu Dec 4 08:58:03 2014 +0800

----------------------------------------------------------------------
docs/mllib-linear-methods.md | 10 +++++++---
docs/mllib-optimization.md   | 17 +++++++++++------
2 files changed, 18 insertions(+), 9 deletions(-)
----------------------------------------------------------------------

http://git-wip-us.apache.org/repos/asf/spark/blob/27ab0b8a/docs/mllib-linear-methods.md
----------------------------------------------------------------------
diff --git a/docs/mllib-linear-methods.md b/docs/mllib-linear-methods.md
index bc914a1..44b7f67 100644
--- a/docs/mllib-linear-methods.md
+++ b/docs/mllib-linear-methods.md
@@ -110,12 +110,16 @@ However, L1 regularization can help promote sparsity in weights leading
to small
It is not recommended to train models without any regularization,
especially when the number of training examples is small.

+### Optimization
+
+Under the hood, linear methods use convex optimization methods to optimize the objective
functions.  MLlib uses two methods, SGD and L-BFGS, described in the [optimization section](mllib-optimization.html).
Currently, most algorithm APIs support Stochastic Gradient Descent (SGD), and a few support
L-BFGS. Refer to [this optimization section](mllib-optimization.html#Choosing-an-Optimization-Method)
for guidelines on choosing between optimization methods.
+
## Binary classification

[Binary classification](http://en.wikipedia.org/wiki/Binary_classification)
aims to divide items into two categories: positive and negative.  MLlib
-supports two linear methods for binary classification: linear support vector
-machines (SVMs) and logistic regression. For both methods, MLlib supports
+supports two linear methods for binary classification: linear Support Vector
+Machines (SVMs) and logistic regression. For both methods, MLlib supports
L1 and L2 regularized variants. The training data set is represented by an RDD
of [LabeledPoint](mllib-data-types.html) in MLlib.  Note that, in the
mathematical formulation in this guide, a training label $y$ is denoted as
@@ -123,7 +127,7 @@ either $+1$ (positive) or $-1$ (negative), which is convenient for the
formulation.  *However*, the negative label is represented by $0$ in MLlib
instead of $-1$, to be consistent with multiclass labeling.

-### Linear support vector machines (SVMs)
+### Linear Support Vector Machines (SVMs)

The [linear SVM](http://en.wikipedia.org/wiki/Support_vector_machine#Linear_SVM)
is a standard method for large-scale classification tasks. It is a linear method as described
above in equation $\eqref{eq:regPrimal}$, with the loss function in the formulation given
by the hinge loss:

http://git-wip-us.apache.org/repos/asf/spark/blob/27ab0b8a/docs/mllib-optimization.md
----------------------------------------------------------------------
diff --git a/docs/mllib-optimization.md b/docs/mllib-optimization.md
index 45141c2..4d101af 100644
--- a/docs/mllib-optimization.md
+++ b/docs/mllib-optimization.md
@@ -138,6 +138,12 @@ vertical scalability issue (the number of training features) when computing
the
explicitly in Newton's method. As a result, L-BFGS often achieves rapider convergence compared
with
other first-order optimization.

+### Choosing an Optimization Method
+
+[Linear methods](mllib-linear-methods.html) use optimization internally, and some linear
methods in MLlib support both SGD and L-BFGS.
+Different optimization methods can have different convergence guarantees depending on the
properties of the objective function, and we cannot cover the literature here.
+In general, when L-BFGS is available, we recommend using it instead of SGD since L-BFGS tends
to converge faster (in fewer iterations).
+
## Implementation in MLlib

@@ -168,10 +174,7 @@ descent. All updaters in MLlib use a step size at the t-th step equal
to
* regParam is the regularization parameter when using L1 or L2 regularization.
* miniBatchFraction is the fraction of the total data that is sampled in
each iteration, to compute the gradient direction.
-
-
+  * Sampling still requires a pass over the entire RDD, so decreasing miniBatchFraction
may not speed up optimization much.  Users will see the greatest speedup when the gradient
is expensive to compute, for only the chosen samples are used for computing the gradient.

### L-BFGS
L-BFGS is currently only a low-level optimization primitive in MLlib. If you want to use
L-BFGS in various
@@ -359,13 +362,15 @@ public class LBFGSExample {
{% endhighlight %}
</div>
</div>
-#### Developer's note
+
+## Developer's notes
+
Since the Hessian is constructed approximately from previous gradient evaluations,
the objective function can not be changed during the optimization process.
As a result, Stochastic L-BFGS will not work naively by just using miniBatch;
therefore, we don't provide this until we have better understanding.

-* Updater is a class originally designed for gradient decent which computes
+Updater is a class originally designed for gradient decent which computes
the actual gradient descent step. However, we're able to take the gradient and
loss of objective function of regularization for L-BFGS by ignoring the part of logic
only for gradient decent such as adaptive step size stuff. We will refactorize

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org