Subject spark git commit: [SPARK-18264][SPARKR] build vignettes with package, update vignettes for CRAN release build and add info on release
Date Fri, 11 Nov 2016 23:49:59 GMT
Repository: spark
Updated Branches:
  refs/heads/master 6e95325fc -> ba23f768f

[SPARK-18264][SPARKR] build vignettes with package, update vignettes for CRAN release build
and add info on release

## What changes were proposed in this pull request?

Changes to DESCRIPTION to build vignettes.
Changes the metadata for vignettes to generate the recommended format (which is about <10%
of size before). Unfortunately it does not look as nice
(before - left, after - right)



Also add information on how to run build/release to CRAN later.

## How was this patch tested?

manually, unit tests


We need this for branch-2.1

Author: Felix Cheung <>

Closes #15790 from felixcheung/rpkgvignettes.


Branch: refs/heads/master
Commit: ba23f768f7419039df85530b84258ec31f0c22b4
Parents: 6e95325
Author: Felix Cheung <>
Authored: Fri Nov 11 15:49:55 2016 -0800
Committer: Shivaram Venkataraman <>
Committed: Fri Nov 11 15:49:55 2016 -0800

 R/                    | 91 +++++++++++++++++++++++++++++++
 R/                          |  8 +--
 R/                      | 33 +++++++++--
 R/                     | 19 +------
 R/pkg/DESCRIPTION                    |  9 ++-
 R/pkg/vignettes/sparkr-vignettes.Rmd |  9 +--
 6 files changed, 134 insertions(+), 35 deletions(-)
diff --git a/R/ b/R/
new file mode 100644
index 0000000..bea8f9f
--- /dev/null
+++ b/R/
@@ -0,0 +1,91 @@
+# SparkR CRAN Release
+To release SparkR as a package to CRAN, we would use the `devtools` package. Please work
with the
+`` community and R package maintainer on this.
+### Release
+First, check that the `Version:` field in the `pkg/DESCRIPTION` file is updated. Also, check
for stale files not under source control.
+Note that while `` is running `R CMD check`, it is doing so with `--no-manual
--no-vignettes`, which skips a few vignettes or PDF checks - therefore it will be preferred
to run `R CMD check` on the source package built manually before uploading a release.
+To upload a release, we would need to update the ``. This should generally
contain the results from running the `` script along with comments on status
of all `WARNING` (should not be any) or `NOTE`. As a part of `` and the release
process, the vignettes is build - make sure `SPARK_HOME` is set and Spark jars are accessible.
+Once everything is in place, run in R under the `SPARK_HOME/R` directory:
+paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute(".."));
devtools::release(); .libPaths(paths)
+For more information please refer to
+### Testing: build package manually
+To build package manually such as to inspect the resulting `.tar.gz` file content, we would
also use the `devtools` package.
+Source package is what get released to CRAN. CRAN would then build platform-specific binary
packages from the source package.
+#### Build source package
+To build source package locally without releasing to CRAN, run in R under the `SPARK_HOME/R`
+paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute(".."));
devtools::build("pkg"); .libPaths(paths)
+Similarly, the source package is also created by `` with `R CMD build pkg`.
+For example, this should be the content of the source package:
+DESCRIPTION	R		inst		tests
+NAMESPACE	build		man		vignettes
+ *.Rd files...
+#### Test source package
+To install, run this:
+R CMD INSTALL SparkR_2.1.0.tar.gz
+With "2.1.0" replaced with the version of SparkR.
+This command installs SparkR to the default libPaths. Once that is done, you should be able
to start R and run:
+vignette("sparkr-vignettes", package="SparkR")
+#### Build binary package
+To build binary package locally, run in R under the `SPARK_HOME/R` directory:
+paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute(".."));
devtools::build("pkg", binary = TRUE); .libPaths(paths)
+For example, this should be the content of the binary package:
+DESCRIPTION	Meta		R		html		tests
+INDEX		NAMESPACE	help		profile		worker
diff --git a/R/ b/R/
index 932d527..47f9a86 100644
--- a/R/
+++ b/R/
@@ -6,7 +6,7 @@ SparkR is an R package that provides a light-weight frontend to use Spark
from R
 Libraries of sparkR need to be created in `$SPARK_HOME/R/lib`. This can be done by running
the script `$SPARK_HOME/R/`.
 By default the above script uses the system wide installation of R. However, this can be
changed to any user installed location of R by setting the environment variable `R_HOME` the
full path of the base directory where R is installed, before running script.
 # where /home/username/R is where R is installed and /home/username/R/bin contains the files
R and RScript
 export R_HOME=/home/username/R
@@ -46,7 +46,7 @@ Sys.setenv(SPARK_HOME="/Users/username/spark")
 # This line loads SparkR from the installed directory
 .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
-sc <- sparkR.init(master="local")
 #### Making changes to SparkR
@@ -54,11 +54,11 @@ sc <- sparkR.init(master="local")
 The [instructions](
for making contributions to Spark also apply to SparkR.
 If you only make R file changes (i.e. no Scala changes) then you can just re-install the
R package using `R/` and test your changes.
 Once you have made your changes, please include unit tests for them and run existing unit
tests using the `R/` script as described below.
 #### Generating documentation
 The SparkR documentation (Rd files and HTML files) are not a part of the source repository.
To generate them you can run the script `R/`. This script uses `devtools` and
`knitr` to generate the docs and these packages need to be installed on the machine before
using the script. Also, you may need to install these [prerequisites](
See also, `R/`
 ### Examples, Unit tests
 SparkR comes with several sample programs in the `examples/src/main/r` directory.
diff --git a/R/ b/R/
index bb33146..c5f0428 100755
--- a/R/
+++ b/R/
@@ -36,11 +36,27 @@ if [ ! -z "$R_HOME" ]
 echo "USING R_HOME = $R_HOME"
-# Build the latest docs
+# Build the latest docs, but not vignettes, which is built with the package next
-# Build a zip file containing the source package
-"$R_SCRIPT_PATH/"R CMD build $FWDIR/pkg
+# Build source package with vignettes
+SPARK_HOME="$(cd "${FWDIR}"/..; pwd)"
+. "${SPARK_HOME}"/bin/
+if [ -f "${SPARK_HOME}/RELEASE" ]; then
+  SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars"
+if [ -d "$SPARK_JARS_DIR" ]; then
+  # Build a zip file containing the source package with vignettes
+  find pkg/vignettes/. -not -name '.' -not -name '*.Rmd' -not -name '*.md' -not -name '*.pdf'
-not -name '*.html' -delete
+  echo "Error Spark JARs not found in $SPARK_HOME"
+  exit 1
 # Run check as-cran.
 VERSION=`grep Version $FWDIR/pkg/DESCRIPTION | awk '{print $NF}'`
@@ -54,11 +70,16 @@ fi
 if [ -n "$NO_MANUAL" ]
+  CRAN_CHECK_OPTIONS=$CRAN_CHECK_OPTIONS" --no-manual --no-vignettes"
 echo "Running CRAN check with $CRAN_CHECK_OPTIONS options"
+if [ -n "$NO_TESTS" ] && [ -n "$NO_MANUAL" ]
+  # This will run tests and/or build vignettes, and require SPARK_HOME
 popd > /dev/null
diff --git a/R/ b/R/
index 69ffc5f..84e6aa9 100755
--- a/R/
+++ b/R/
@@ -20,7 +20,7 @@
 # Script to create API docs and vignettes for SparkR
 # This requires `devtools`, `knitr` and `rmarkdown` to be installed on the machine.
-# After running this script the html docs can be found in 
+# After running this script the html docs can be found in
 # $SPARK_HOME/R/pkg/html
 # The vignettes can be found in
 # $SPARK_HOME/R/pkg/vignettes/sparkr_vignettes.html
@@ -52,21 +52,4 @@ Rscript -e 'libDir <- "../../lib"; library(SparkR, lib.loc=libDir);
-# Find Spark jars.
-if [ -f "${SPARK_HOME}/RELEASE" ]; then
-  SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars"
-# Only create vignettes if Spark JARs exist
-if [ -d "$SPARK_JARS_DIR" ]; then
-  # render creates SparkR vignettes
-  Rscript -e 'library(rmarkdown); paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute(".."));
render("pkg/vignettes/sparkr-vignettes.Rmd"); .libPaths(paths)'
-  find pkg/vignettes/. -not -name '.' -not -name '*.Rmd' -not -name '*.md' -not -name '*.pdf'
-not -name '*.html' -delete
-  echo "Skipping R vignettes as Spark JARs not found in $SPARK_HOME"
diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 5a83883..fe41a9e 100644
@@ -1,8 +1,8 @@
 Package: SparkR
 Type: Package
 Title: R Frontend for Apache Spark
-Version: 2.0.0
-Date: 2016-08-27
+Version: 2.1.0
+Date: 2016-11-06
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
                     email = ""),
              person("Xiangrui", "Meng", role = "aut",
@@ -18,7 +18,9 @@ Depends:
-    survival
+    survival,
+    knitr,
+    rmarkdown
 Description: The SparkR package provides an R frontend for Apache Spark.
 License: Apache License (== 2.0)
@@ -48,3 +50,4 @@ Collate:
 RoxygenNote: 5.0.1
+VignetteBuilder: knitr
diff --git a/R/pkg/vignettes/sparkr-vignettes.Rmd b/R/pkg/vignettes/sparkr-vignettes.Rmd
index 80e8760..73a5e26 100644
--- a/R/pkg/vignettes/sparkr-vignettes.Rmd
+++ b/R/pkg/vignettes/sparkr-vignettes.Rmd
@@ -1,12 +1,13 @@
 title: "SparkR - Practical Guide"
-  html_document:
-    theme: united
+  rmarkdown::html_vignette:
     toc: true
     toc_depth: 4
-    toc_float: true
-    highlight: textmate
+vignette: >
+  %\VignetteIndexEntry{SparkR - Practical Guide}
+  %\VignetteEngine{knitr::rmarkdown}
+  \usepackage[utf8]{inputenc}
 ## Overview

