Subject spark git commit: [SPARK-18812][MLLIB] explain "Spark ML"
Date Sat, 10 Dec 2016 01:34:56 GMT
[SPARK-18812][MLLIB] explain "Spark ML"

## What changes were proposed in this pull request?

There has been some confusion around "Spark ML" vs. "MLlib". This PR adds some FAQ-like entries
to the MLlib user guide to explain "Spark ML" and reduce the confusion.

I check the [Spark FAQ page](, which seems too high-level
for the content here. So I added it to the MLlib user guide instead.

cc: mateiz

Author: Xiangrui Meng <>

Closes #16241 from mengxr/SPARK-18812.


@@ -35,6 +35,18 @@ The primary Machine Learning API for Spark is now the [DataFrame](sql-programmin
 * The DataFrame-based API for MLlib provides a uniform API across ML algorithms and across
multiple languages.
 * DataFrames facilitate practical ML Pipelines, particularly feature transformations.  See
the [Pipelines guide](ml-pipeline.html) for details.
+*What is "Spark ML"?*
+* "Spark ML" is not an official name but occasionally used to refer to the MLlib DataFrame-based
+  This is majorly due to the `` Scala package name used by the DataFrame-based
+  and the "Spark ML Pipelines" term we used initially to emphasize the pipeline concept.
+*Is MLlib deprecated?*
+* No. MLlib includes both the RDD-based API and the DataFrame-based API.
+  The RDD-based API is now in maintenance mode.
+  But neither API is deprecated, nor MLlib as a whole.
 # Dependencies
 MLlib uses the linear algebra package [Breeze](, which depends on

