spark-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject spark git commit: [MINOR][SPARKR][DOC] Add a description for running unit tests in Windows
Date Tue, 24 May 2016 00:20:41 GMT
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 4673b88b4 -> ca271c792

[MINOR][SPARKR][DOC] Add a description for running unit tests in Windows

## What changes were proposed in this pull request?

This PR adds the description for running unit tests in Windows.

## How was this patch tested?

On a bare machine (Window 7, 32bits), this was manually built and tested.

Author: hyukjinkwon <>

Closes #13217 from HyukjinKwon/minor-r-doc.

(cherry picked from commit a8e97d17b91684e68290d9f18a43622232aa94e7)
Signed-off-by: Shivaram Venkataraman <>


Branch: refs/heads/branch-2.0
Commit: ca271c79279fc2e4d4005aaf50426578d824ac92
Parents: 4673b88
Author: hyukjinkwon <>
Authored: Mon May 23 17:20:29 2016 -0700
Committer: Shivaram Venkataraman <>
Committed: Mon May 23 17:20:37 2016 -0700

 R/  |  8 +++++++-
 R/ | 20 ++++++++++++++++++++
 2 files changed, 27 insertions(+), 1 deletion(-)
diff --git a/R/ b/R/
index 810bfc1..044f953 100644
--- a/R/
+++ b/R/
@@ -1,11 +1,13 @@
 # R on Spark
 SparkR is an R package that provides a light-weight frontend to use Spark from R.
 ### Installing sparkR
 Libraries of sparkR need to be created in `$SPARK_HOME/R/lib`. This can be done by running
the script `$SPARK_HOME/R/`.
 By default the above script uses the system wide installation of R. However, this can be
changed to any user installed location of R by setting the environment variable `R_HOME` the
full path of the base directory where R is installed, before running script.
 # where /home/username/R is where R is installed and /home/username/R/bin contains the files
R and RScript
 export R_HOME=/home/username/R
@@ -17,6 +19,7 @@ export R_HOME=/home/username/R
 #### Build Spark
 Build Spark with [Maven](
and include the `-Psparkr` profile to build the R package. For example to use the default
Hadoop versions you can run
   build/mvn -DskipTests -Psparkr package
@@ -38,6 +41,7 @@ To set other options like driver memory, executor memory etc. you can pass
in th
 #### Using SparkR from RStudio
 If you wish to use SparkR from RStudio or other R frontends you will need to set some environment
variables which point SparkR to your Spark installation. For example 
 # Set this to where Spark is installed
@@ -64,13 +68,15 @@ To run one of them, use `./bin/spark-submit <filename> <args>`.
For example:
     ./bin/spark-submit examples/src/main/r/dataframe.R
-You can also run the unit-tests for SparkR by running (you need to install the [testthat](
package first):
+You can also run the unit tests for SparkR by running. You need to install the [testthat](
package first:
     R -e 'install.packages("testthat", repos="")'
 ### Running on YARN
 The `./bin/spark-submit` can also be used to submit jobs to YARN clusters. You will need
to set YARN conf dir before doing so. For example on CDH you can run
 export YARN_CONF_DIR=/etc/hadoop/conf
 ./bin/spark-submit --master yarn examples/src/main/r/dataframe.R
diff --git a/R/ b/R/
index 3f889c0..f948ed3 100644
--- a/R/
+++ b/R/
@@ -11,3 +11,23 @@ include Rtools and R in `PATH`.
 directory in Maven in `PATH`.
 4. Set `MAVEN_OPTS` as described in [Building Spark](
 5. Open a command shell (`cmd`) in the Spark directory and run `mvn -DskipTests -Psparkr
+##  Unit tests
+To run the SparkR unit tests on Windows, the following steps are required —assuming you
are in the Spark root directory and do not have Apache Hadoop installed already:
+1. Create a folder to download Hadoop related files for Windows. For example, `cd ..` and
`mkdir hadoop`.
+2. Download the relevant Hadoop bin package from [steveloughran/winutils](
While these are not official ASF artifacts, they are built from the ASF release git hashes
by a Hadoop PMC member on a dedicated Windows VM. For further reading, consult [Windows Problems
on the Hadoop wiki](
+3. Install the files into `hadoop\bin`; make sure that `winutils.exe` and `hadoop.dll` are
+4. Set the environment variable `HADOOP_HOME` to the full path to the newly created `hadoop`
+5. Run unit tests for SparkR by running the command below. You need to install the [testthat](
package first:
+    ```
+    R -e "install.packages('testthat', repos='')"
+    .\bin\spark-submit2.cmd --conf"file:///" R\pkg\tests\run-all.R
+    ```

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message