spark-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sro...@apache.org
Subject spark git commit: Added more information to Imputer
Date Mon, 30 Oct 2017 07:25:22 GMT
Repository: spark
Updated Branches:
  refs/heads/master 188b47e68 -> 6eda55f72


Added more information to Imputer

Often times we want to impute custom values other than 'NaN'. My addition helps people locate
this function without reading the API.

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: tengpeng <tengpeng@users.noreply.github.com>

Closes #19600 from tengpeng/patch-5.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6eda55f7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6eda55f7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6eda55f7

Branch: refs/heads/master
Commit: 6eda55f728a6f2e265ae12a7e01dae88e4172715
Parents: 188b47e
Author: tengpeng <tengpeng@users.noreply.github.com>
Authored: Mon Oct 30 07:24:55 2017 +0000
Committer: Sean Owen <sowen@cloudera.com>
Committed: Mon Oct 30 07:24:55 2017 +0000

----------------------------------------------------------------------
 docs/ml-features.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/6eda55f7/docs/ml-features.md
----------------------------------------------------------------------
diff --git a/docs/ml-features.md b/docs/ml-features.md
index 86a0e09..7264313 100644
--- a/docs/ml-features.md
+++ b/docs/ml-features.md
@@ -1373,7 +1373,9 @@ for more details on the API.
 The `Imputer` transformer completes missing values in a dataset, either using the mean or
the 
 median of the columns in which the missing values are located. The input columns should be
of
 `DoubleType` or `FloatType`. Currently `Imputer` does not support categorical features and
possibly
-creates incorrect values for columns containing categorical features.
+creates incorrect values for columns containing categorical features. Imputer can impute
custom values 
+other than 'NaN' by `.setMissingValue(custom_value)`. For example, `.setMissingValue(0)`
will impute 
+all occurrences of (0).
 
 **Note** all `null` values in the input columns are treated as missing, and so are also imputed.
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


Mime
View raw message