spark-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From m...@apache.org
Subject spark git commit: [SPARK-5254][MLLIB] Update the user guide to position spark.ml better
Date Thu, 15 Jan 2015 01:50:44 GMT
Repository: spark
Updated Branches:
  refs/heads/branch-1.2 f7bbe297a -> 47fb0d0ea


[SPARK-5254][MLLIB] Update the user guide to position spark.ml better

The current statement in the user guide may deliver confusing messages to users. spark.ml
contains high-level APIs for building ML pipelines. But it doesn't mean that spark.mllib is
being deprecated.

First of all, the pipeline API is in its alpha stage and we need to see more use cases from
the community to stabilizes it, which may take several releases. Secondly, the components
in spark.ml are simple wrappers over spark.mllib implementations. Neither the APIs or the
implementations from spark.mllib are being deprecated. We expect users use spark.ml pipeline
APIs to build their ML pipelines, but we will keep supporting and adding features to spark.mllib.
For example, there are many features in review at https://spark-prs.appspot.com/#mllib. So
users should be comfortable with using spark.mllib features and expect more coming. The user
guide needs to be updated to make the message clear.

Author: Xiangrui Meng <meng@databricks.com>

Closes #4052 from mengxr/SPARK-5254 and squashes the following commits:

6d5f1d3 [Xiangrui Meng] typo
0cc935b [Xiangrui Meng] update user guide to position spark.ml better

(cherry picked from commit 13d2406781714daea2bbf3bfb7fec0dead10760c)
Signed-off-by: Xiangrui Meng <meng@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/47fb0d0e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/47fb0d0e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/47fb0d0e

Branch: refs/heads/branch-1.2
Commit: 47fb0d0ea4c0d4316e5ceb06e03c430a2370713b
Parents: f7bbe29
Author: Xiangrui Meng <meng@databricks.com>
Authored: Wed Jan 14 17:50:33 2015 -0800
Committer: Xiangrui Meng <meng@databricks.com>
Committed: Wed Jan 14 17:50:41 2015 -0800

----------------------------------------------------------------------
 docs/ml-guide.md    | 17 ++++++++++-------
 docs/mllib-guide.md | 18 +++++++++++-------
 2 files changed, 21 insertions(+), 14 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/47fb0d0e/docs/ml-guide.md
----------------------------------------------------------------------
diff --git a/docs/ml-guide.md b/docs/ml-guide.md
index 1c2e273..88158fd 100644
--- a/docs/ml-guide.md
+++ b/docs/ml-guide.md
@@ -3,13 +3,16 @@ layout: global
 title: Spark ML Programming Guide
 ---
 
-Spark ML is Spark's new machine learning package.  It is currently an alpha component but
is potentially a successor to [MLlib](mllib-guide.html). The `spark.ml` package aims to replace
the old APIs with a cleaner, more uniform set of APIs which will help users create full machine
learning pipelines.
-
-MLlib vs. Spark ML:
-
-* Users can use algorithms from either of the two packages, but APIs may differ.  Currently,
`spark.ml` offers a subset of the algorithms from `spark.mllib`. Since Spark ML is an alpha
component, its API may change in future releases.
-* Developers should contribute new algorithms to `spark.mllib` and can optionally contribute
to `spark.ml`.  See below for more details.
-* Spark ML only has Scala and Java APIs, whereas MLlib also has a Python API.
+`spark.ml` is a new package introduced in Spark 1.2, which aims to provide a uniform set
of
+high-level APIs that help users create and tune practical machine learning pipelines.
+It is currently an alpha component, and we would like to hear back from the community about
+how it fits real-world use cases and how it could be improved.
+
+Note that we will keep supporting and adding features to `spark.mllib` along with the
+development of `spark.ml`.
+Users should be comfortable using `spark.mllib` features and expect more features coming.
+Developers should contribute new algorithms to `spark.mllib` and can optionally contribute
+to `spark.ml`.
 
 **Table of Contents**
 

http://git-wip-us.apache.org/repos/asf/spark/blob/47fb0d0e/docs/mllib-guide.md
----------------------------------------------------------------------
diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md
index efd7dda..39c64d0 100644
--- a/docs/mllib-guide.md
+++ b/docs/mllib-guide.md
@@ -35,16 +35,20 @@ MLlib is under active development.
 The APIs marked `Experimental`/`DeveloperApi` may change in future releases, 
 and the migration guide below will explain all changes between releases.
 
-# spark.ml: The New ML Package
+# spark.ml: high-level APIs for ML pipelines
 
-Spark 1.2 includes a new machine learning package called `spark.ml`, currently an alpha component
but potentially a successor to `spark.mllib`.  The `spark.ml` package aims to replace the
old APIs with a cleaner, more uniform set of APIs which will help users create full machine
learning pipelines.
+Spark 1.2 includes a new package called `spark.ml`, which aims to provide a uniform set of
+high-level APIs that help users create and tune practical machine learning pipelines.
+It is currently an alpha component, and we would like to hear back from the community about
+how it fits real-world use cases and how it could be improved.
 
-See the **[spark.ml programming guide](ml-guide.html)** for more information on this package.
-
-Users can use algorithms from either of the two packages, but APIs may differ.  Currently,
`spark.ml` offers a subset of the algorithms from `spark.mllib`.
+Note that we will keep supporting and adding features to `spark.mllib` along with the
+development of `spark.ml`.
+Users should be comfortable using `spark.mllib` features and expect more features coming.
+Developers should contribute new algorithms to `spark.mllib` and can optionally contribute
+to `spark.ml`.
 
-Developers should contribute new algorithms to `spark.mllib` and can optionally contribute
to `spark.ml`.
-See the `spark.ml` programming guide linked above for more details.
+See the **[spark.ml programming guide](ml-guide.html)** for more information on this package.
 
 # Dependencies
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


Mime
View raw message