spark-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jkbrad...@apache.org
Subject spark git commit: [SPARK-18795][ML][SPARKR][DOC] Added KSTest section to SparkR vignettes
Date Wed, 14 Dec 2016 22:10:43 GMT
Repository: spark
Updated Branches:
  refs/heads/master 1ac6567bd -> 786274257


[SPARK-18795][ML][SPARKR][DOC] Added KSTest section to SparkR vignettes

## What changes were proposed in this pull request?

Added short section for KSTest.
Also added logreg model to list of ML models in vignette.  (This will be reorganized under
SPARK-18849)

![screen shot 2016-12-14 at 1 37 31 pm](https://cloud.githubusercontent.com/assets/5084283/21202140/7f24e240-c202-11e6-9362-458208bb9159.png)

## How was this patch tested?

Manually tested example locally.
Built vignettes locally.

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #16283 from jkbradley/ksTest-vignette.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/78627425
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/78627425
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/78627425

Branch: refs/heads/master
Commit: 78627425708a0afbe113efdf449e8622b43b652d
Parents: 1ac6567
Author: Joseph K. Bradley <joseph@databricks.com>
Authored: Wed Dec 14 14:10:40 2016 -0800
Committer: Joseph K. Bradley <joseph@databricks.com>
Committed: Wed Dec 14 14:10:40 2016 -0800

----------------------------------------------------------------------
 R/pkg/vignettes/sparkr-vignettes.Rmd | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/78627425/R/pkg/vignettes/sparkr-vignettes.Rmd
----------------------------------------------------------------------
diff --git a/R/pkg/vignettes/sparkr-vignettes.Rmd b/R/pkg/vignettes/sparkr-vignettes.Rmd
index 334daa5..d507e2c 100644
--- a/R/pkg/vignettes/sparkr-vignettes.Rmd
+++ b/R/pkg/vignettes/sparkr-vignettes.Rmd
@@ -469,6 +469,10 @@ SparkR supports the following machine learning models and algorithms.
 
 * Isotonic Regression Model
 
+* Logistic Regression Model
+
+* Kolmogorov-Smirnov Test
+
 More will be added in the future.
 
 ### R Formula
@@ -800,7 +804,7 @@ newDF <- createDataFrame(data.frame(x = c(1.5, 3.2)))
 head(predict(isoregModel, newDF))
 ```
 
-### Logistic Regression Model
+#### Logistic Regression Model
 
 (Added in 2.1.0)
 
@@ -834,6 +838,29 @@ model <- spark.logit(df, Species ~ ., regParam = 0.5)
 summary(model)
 ```
 
+#### Kolmogorov-Smirnov Test
+
+`spark.kstest` runs a two-sided, one-sample [Kolmogorov-Smirnov (KS) test](https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test).
+Given a `SparkDataFrame`, the test compares continuous data in a given column `testCol` with
the theoretical distribution
+specified by parameter `nullHypothesis`.
+Users can call `summary` to get a summary of the test results.
+
+In the following example, we test whether the `longley` dataset's `Armed_Forces` column
+follows a normal distribution.  We set the parameters of the normal distribution using
+the mean and standard deviation of the sample.
+
+```{r, warning=FALSE}
+df <- createDataFrame(longley)
+afStats <- head(select(df, mean(df$Armed_Forces), sd(df$Armed_Forces)))
+afMean <- afStats[1]
+afStd <- afStats[2]
+
+test <- spark.kstest(df, "Armed_Forces", "norm", c(afMean, afStd))
+testSummary <- summary(test)
+testSummary
+```
+
+
 ### Model Persistence
 The following example shows how to save/load an ML model by SparkR.
 ```{r, warning=FALSE}


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


Mime
View raw message