spark-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From shiva...@apache.org
Subject spark git commit: [SPARKR][DOC] minor formatting and output cleanup for R vignettes
Date Tue, 04 Oct 2016 16:22:29 GMT
Repository: spark
Updated Branches:
  refs/heads/master c17f97183 -> 068c198e9


[SPARKR][DOC] minor formatting and output cleanup for R vignettes

## What changes were proposed in this pull request?

Clean up output, format table, truncate long example output, hide warnings

(new - Left; existing - Right)
![image](https://cloud.githubusercontent.com/assets/8969467/19064018/5dcde4d0-89bc-11e6-857b-052df3f52a4e.png)

![image](https://cloud.githubusercontent.com/assets/8969467/19064034/6db09956-89bc-11e6-8e43-232d5c3fe5e6.png)

![image](https://cloud.githubusercontent.com/assets/8969467/19064058/88f09590-89bc-11e6-9993-61639e29dfdd.png)

![image](https://cloud.githubusercontent.com/assets/8969467/19064066/95ccbf64-89bc-11e6-877f-45af03ddcadc.png)

![image](https://cloud.githubusercontent.com/assets/8969467/19064082/a8445404-89bc-11e6-8532-26d8bc9b206f.png)

## How was this patch tested?

Run create-doc.sh manually

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #15340 from felixcheung/vignettes.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/068c198e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/068c198e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/068c198e

Branch: refs/heads/master
Commit: 068c198e956346b90968a4d74edb7bc820c4be28
Parents: c17f971
Author: Felix Cheung <felixcheung_m@hotmail.com>
Authored: Tue Oct 4 09:22:26 2016 -0700
Committer: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Committed: Tue Oct 4 09:22:26 2016 -0700

----------------------------------------------------------------------
 R/pkg/vignettes/sparkr-vignettes.Rmd | 31 ++++++++++++++++++++-----------
 1 file changed, 20 insertions(+), 11 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/068c198e/R/pkg/vignettes/sparkr-vignettes.Rmd
----------------------------------------------------------------------
diff --git a/R/pkg/vignettes/sparkr-vignettes.Rmd b/R/pkg/vignettes/sparkr-vignettes.Rmd
index aea52db..80e8760 100644
--- a/R/pkg/vignettes/sparkr-vignettes.Rmd
+++ b/R/pkg/vignettes/sparkr-vignettes.Rmd
@@ -26,7 +26,7 @@ library(SparkR)
 
 We use default settings in which it runs in local mode. It auto downloads Spark package in
the background if no previous installation is found. For more details about setup, see [Spark
Session](#SetupSparkSession).
 
-```{r, message=FALSE}
+```{r, message=FALSE, results="hide"}
 sparkR.session()
 ```
 
@@ -114,10 +114,12 @@ In particular, the following Spark driver properties can be set in `sparkConfig`
 
 Property Name | Property group | spark-submit equivalent
 ---------------- | ------------------ | ----------------------
-spark.driver.memory | Application Properties | --driver-memory
-spark.driver.extraClassPath | Runtime Environment | --driver-class-path
-spark.driver.extraJavaOptions | Runtime Environment | --driver-java-options
-spark.driver.extraLibraryPath | Runtime Environment | --driver-library-path
+`spark.driver.memory` | Application Properties | `--driver-memory`
+`spark.driver.extraClassPath` | Runtime Environment | `--driver-class-path`
+`spark.driver.extraJavaOptions` | Runtime Environment | `--driver-java-options`
+`spark.driver.extraLibraryPath` | Runtime Environment | `--driver-library-path`
+`spark.yarn.keytab` | Application Properties | `--keytab`
+`spark.yarn.principal` | Application Properties | `--principal`
 
 **For Windows users**: Due to different file prefixes across operating systems, to avoid
the issue of potential wrong prefix, a current workaround is to specify `spark.sql.warehouse.dir`
when starting the `SparkSession`.
 
@@ -161,7 +163,7 @@ head(df)
 ### Data Sources
 SparkR supports operating on a variety of data sources through the `SparkDataFrame` interface.
You can check the Spark SQL programming guide for more [specific options](https://spark.apache.org/docs/latest/sql-programming-guide.html#manually-specifying-options)
that are available for the built-in data sources.
 
-The general method for creating `SparkDataFrame` from data sources is `read.df`. This method
takes in the path for the file to load and the type of data source, and the currently active
Spark Session will be used automatically. SparkR supports reading CSV, JSON and Parquet files
natively and through Spark Packages you can find data source connectors for popular file formats
like Avro. These packages can be added with `sparkPackages` parameter when initializing SparkSession
using `sparkR.session'.`
+The general method for creating `SparkDataFrame` from data sources is `read.df`. This method
takes in the path for the file to load and the type of data source, and the currently active
Spark Session will be used automatically. SparkR supports reading CSV, JSON and Parquet files
natively and through Spark Packages you can find data source connectors for popular file formats
like Avro. These packages can be added with `sparkPackages` parameter when initializing SparkSession
using `sparkR.session`.
 
 ```{r, eval=FALSE}
 sparkR.session(sparkPackages = "com.databricks:spark-avro_2.11:3.0.0")
@@ -406,10 +408,17 @@ class(model.summaries)
 ```
 
 
-To avoid lengthy display, we only present the result of the second fitted model. You are
free to inspect other models as well.
+To avoid lengthy display, we only present the partial result of the second fitted model.
You are free to inspect other models as well.
+```{r, include=FALSE}
+ops <- options()
+options(max.print=40)
+```
 ```{r}
 print(model.summaries[[2]])
 ```
+```{r, include=FALSE}
+options(ops)
+```
 
 
 ### SQL Queries
@@ -544,7 +553,7 @@ head(select(kmeansPredictions, "model", "mpg", "hp", "wt", "prediction"),
n = 20
 Survival analysis studies the expected duration of time until an event happens, and often
the relationship with risk factors or treatment taken on the subject. In contrast to standard
regression analysis, survival modeling has to deal with special characteristics in the data
including non-negative survival time and censoring.
 
 Accelerated Failure Time (AFT) model is a parametric survival model for censored data that
assumes the effect of a covariate is to accelerate or decelerate the life course of an event
by some constant. For more information, refer to the Wikipedia page [AFT Model](https://en.wikipedia.org/wiki/Accelerated_failure_time_model)
and the references there. Different from a [Proportional Hazards Model](https://en.wikipedia.org/wiki/Proportional_hazards_model)
designed for the same purpose, the AFT model is easier to parallelize because each instance
contributes to the objective function independently.
-```{r}
+```{r, warning=FALSE}
 library(survival)
 ovarianDF <- createDataFrame(ovarian)
 aftModel <- spark.survreg(ovarianDF, Surv(futime, fustat) ~ ecog_ps + rx)
@@ -678,7 +687,7 @@ MLPC employs backpropagation for learning the model. We use the logistic
loss fu
 
 * `tol`: convergence tolerance of iterations.
 
-* `stepSize`: step size for `"gd"`.	
+* `stepSize`: step size for `"gd"`.
 
 * `seed`: seed parameter for weights initialization.
 
@@ -763,8 +772,8 @@ We also expect Decision Tree, Random Forest, Kolmogorov-Smirnov Test coming
in t
 
 ### Model Persistence
 The following example shows how to save/load an ML model by SparkR.
-```{r}
-irisDF <- suppressWarnings(createDataFrame(iris))
+```{r, warning=FALSE}
+irisDF <- createDataFrame(iris)
 gaussianGLM <- spark.glm(irisDF, Sepal_Length ~ Sepal_Width + Species, family = "gaussian")
 
 # Save and then load a fitted MLlib model


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


Mime
View raw message