spark-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject spark git commit: [SPARK-19797][DOC] ML pipeline document correction
Date Fri, 03 Mar 2017 10:56:11 GMT
Repository: spark
Updated Branches:
  refs/heads/branch-2.1 1237aaea2 -> accbed7c2

[SPARK-19797][DOC] ML pipeline document correction

## What changes were proposed in this pull request?
Description about pipeline in this paragraph is incorrect

> If the Pipeline had more **stages**, it would call the LogisticRegressionModel’s transform()
method on the DataFrame before passing the DataFrame to the next stage.

Reason: Transformer could also be a stage. But only another Estimator will invoke an transform
call and pass the data to next stage. The description in the document misleads ML pipeline

## How was this patch tested?
This is a tiny modification of **docs/**. I jekyll build the modification and
check the compiled document.

Author: Zhe Sun <>

Closes #17137 from ymwdalex/SPARK-19797-ML-pipeline-document-correction.

(cherry picked from commit 0bac3e4cde75678beac02e67b8873fe779e9ad34)
Signed-off-by: Sean Owen <>


Branch: refs/heads/branch-2.1
Commit: accbed7c2cfbe46fa6f55e97241b617c6ad4431f
Parents: 1237aae
Author: Zhe Sun <>
Authored: Fri Mar 3 11:55:57 2017 +0100
Committer: Sean Owen <>
Committed: Fri Mar 3 11:56:07 2017 +0100

 docs/ | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/docs/ b/docs/
index 7cbb146..aa92c0a 100644
--- a/docs/
+++ b/docs/
@@ -132,7 +132,7 @@ The `` method is called on the original `DataFrame`, which
has raw
 The `Tokenizer.transform()` method splits the raw text documents into words, adding a new
column with words to the `DataFrame`.
 The `HashingTF.transform()` method converts the words column into feature vectors, adding
a new column with those vectors to the `DataFrame`.
 Now, since `LogisticRegression` is an `Estimator`, the `Pipeline` first calls ``
to produce a `LogisticRegressionModel`.
-If the `Pipeline` had more stages, it would call the `LogisticRegressionModel`'s `transform()`
+If the `Pipeline` had more `Estimator`s, it would call the `LogisticRegressionModel`'s `transform()`
 method on the `DataFrame` before passing the `DataFrame` to the next stage.
 A `Pipeline` is an `Estimator`.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message