Return-Path: X-Original-To: apmail-zeppelin-commits-archive@minotaur.apache.org Delivered-To: apmail-zeppelin-commits-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E8B3F190BD for ; Tue, 5 Apr 2016 07:35:54 +0000 (UTC) Received: (qmail 6328 invoked by uid 500); 5 Apr 2016 07:35:54 -0000 Delivered-To: apmail-zeppelin-commits-archive@zeppelin.apache.org Received: (qmail 6284 invoked by uid 500); 5 Apr 2016 07:35:54 -0000 Mailing-List: contact commits-help@zeppelin.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@zeppelin.incubator.apache.org Delivered-To: mailing list commits@zeppelin.incubator.apache.org Received: (qmail 6275 invoked by uid 99); 5 Apr 2016 07:35:54 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Apr 2016 07:35:54 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id E182F1A1103 for ; Tue, 5 Apr 2016 07:35:53 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -3.219 X-Spam-Level: X-Spam-Status: No, score=-3.219 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.001, URI_TRY_3LD=0.001, WEIRD_QUOTING=0.001] autolearn=disabled Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id dOwB3X2oXlpW for ; Tue, 5 Apr 2016 07:35:41 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with SMTP id A93185F54E for ; Tue, 5 Apr 2016 07:35:38 +0000 (UTC) Received: (qmail 6143 invoked by uid 99); 5 Apr 2016 07:35:37 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Apr 2016 07:35:37 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 5C682DFBD6; Tue, 5 Apr 2016 07:35:37 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: bzz@apache.org To: commits@zeppelin.incubator.apache.org Date: Tue, 05 Apr 2016 07:35:39 -0000 Message-Id: In-Reply-To: <7886f75f58074cb8b68dc1fb58ad1423@git.apache.org> References: <7886f75f58074cb8b68dc1fb58ad1423@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: [3/3] incubator-zeppelin git commit: R Interpreter for Zeppelin R Interpreter for Zeppelin This is the initial PR for an R Interpreter for Zeppelin. There's still some work to be done (e.g., tests), but its useable, it brings to Zeppelin features from R like its library of statistics and machine learning packages, as well as advanced interactive visualizations. So I'd like to open it up for others to comment and/or become involved. Summary: - There are two interpreters, one emulates a REPL, the other uses knitr to weave markdown and formatted R output. The two interpreters share a single execution environment. - Visualisations: Besides R's own graphics, this also supports interactive visualizations with googleVis and rCharts. I am working on htmlwidgets (almost done) with the author of that package, and a next-step project is to get Shiny/ggvis working. Sometimes, a visualization won't load until the page is reloaded. I'm not sure why this is. - Licensing: To talk to R, this integrates code forked from rScala. rScala was released with a BSD-license option, and the author's permission was obtained. - Spark: Getting R to share a single spark context with the Spark interpreter group is going to be a project. For right now, the R interpreters live in their own "r" interpreter group, and new spark contexts are created on startup. - Zeppelin Context: Not yet integrated, in significant part because there's no ZeppelinContext to talk to until it lives in the Spark interpreter group. - Documentation: A notebook is included that demonstrates what the interpreter does and how to use it. - Tests: Working on it... P.S.: This is my first PR on a project of this size; let me know what I messed up and I'll try to fix it ASAP. Author: Amos Elb Author: Amos B. Elberg Closes #208 from elbamos/rinterpreter and squashes the following commits: ffc1a25 [Amos Elb] Fix rat issue a08ec5b [Amos B. Elberg] R Interpreter Project: http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/commit/d5e87fb8 Tree: http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/tree/d5e87fb8 Diff: http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/diff/d5e87fb8 Branch: refs/heads/master Commit: d5e87fb8ba98f08db5b0a4995104ce19f182c678 Parents: b51af33 Author: Amos Elb Authored: Mon Apr 4 13:29:41 2016 -0400 Committer: Alexander Bezzubov Committed: Tue Apr 5 16:35:18 2016 +0900 ---------------------------------------------------------------------- .travis.yml | 25 +- LICENSE | 14 +- bin/interpreter.sh | 7 +- conf/zeppelin-site.xml.template | 2 +- docs/interpreter/r.md | 100 ++++ docs/interpreter/screenshots/backtoscala.png | Bin 0 -> 36308 bytes docs/interpreter/screenshots/knitgeo.png | Bin 0 -> 59594 bytes docs/interpreter/screenshots/knitmotion.png | Bin 0 -> 33468 bytes docs/interpreter/screenshots/knitstock.png | Bin 0 -> 108868 bytes docs/interpreter/screenshots/repl2plus2.png | Bin 0 -> 13143 bytes docs/interpreter/screenshots/replhead.png | Bin 0 -> 42923 bytes docs/interpreter/screenshots/replhist.png | Bin 0 -> 31481 bytes docs/interpreter/screenshots/sparkrfaithful.png | Bin 0 -> 52235 bytes docs/interpreter/screenshots/varr1.png | Bin 0 -> 16703 bytes docs/interpreter/screenshots/varr2.png | Bin 0 -> 18973 bytes docs/interpreter/screenshots/varscala.png | Bin 0 -> 21073 bytes licenses/LICENSE-rscala-1.0.6 | 29 + licenses/LICENSE-scala-2.10 | 11 + pom.xml | 27 +- r/R/install-dev.sh | 41 ++ r/R/rzeppelin/DESCRIPTION | 28 + r/R/rzeppelin/LICENSE | 14 + r/R/rzeppelin/NAMESPACE | 7 + r/R/rzeppelin/R/common.R | 14 + r/R/rzeppelin/R/globals.R | 3 + r/R/rzeppelin/R/protocol.R | 35 ++ r/R/rzeppelin/R/rServer.R | 214 ++++++++ r/R/rzeppelin/R/rzeppelin.R | 95 ++++ r/R/rzeppelin/R/scalaInterpreter.R | 123 +++++ r/R/rzeppelin/R/zzz.R | 9 + r/_tools/checkstyle.xml | 282 ++++++++++ r/_tools/scalastyle.xml | 146 +++++ r/pom.xml | 396 ++++++++++++++ .../org/apache/zeppelin/rinterpreter/KnitR.java | 135 +++++ .../org/apache/zeppelin/rinterpreter/RRepl.java | 135 +++++ .../apache/zeppelin/rinterpreter/RStatics.java | 86 +++ .../org/apache/spark/api/r/RBackendHelper.scala | 84 +++ .../rinterpreter/KnitRInterpreter.scala | 77 +++ .../apache/zeppelin/rinterpreter/RContext.scala | 321 +++++++++++ .../zeppelin/rinterpreter/RInterpreter.scala | 167 ++++++ .../rinterpreter/RReplInterpreter.scala | 98 ++++ .../apache/zeppelin/rinterpreter/package.scala | 29 + .../zeppelin/rinterpreter/rscala/Package.scala | 39 ++ .../zeppelin/rinterpreter/rscala/RClient.scala | 527 +++++++++++++++++++ .../rinterpreter/rscala/RException.scala | 31 ++ r/src/main/scala/scala/Console.scala | 491 +++++++++++++++++ .../apache/spark/api/r/RBackendHelperTest.scala | 49 ++ .../rinterpreter/RContextInitTest.scala | 113 ++++ .../zeppelin/rinterpreter/RContextTest.scala | 115 ++++ .../rinterpreter/RInterpreterTest.scala | 141 +++++ .../zeppelin/rinterpreter/WrapperTest.scala | 103 ++++ .../apache/zeppelin/rinterpreter/package.scala | 23 + spark/pom.xml | 1 - .../org/apache/zeppelin/spark/SparkVersion.java | 5 +- .../zeppelin/rest/ZeppelinSparkClusterTest.java | 364 ++++++------- zeppelin-web/bower.json | 1 + zeppelin-web/pom.xml | 1 + zeppelin-web/src/index.html | 1 + zeppelin-web/test/karma.conf.js | 1 + .../zeppelin/conf/ZeppelinConfiguration.java | 4 +- 60 files changed, 4572 insertions(+), 192 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/d5e87fb8/.travis.yml ---------------------------------------------------------------------- diff --git a/.travis.yml b/.travis.yml index 72b748e..ba66bae 100644 --- a/.travis.yml +++ b/.travis.yml @@ -16,23 +16,24 @@ language: java sudo: false + cache: directories: - .spark-dist - + matrix: include: # Test all modules - jdk: "oraclejdk7" - env: SPARK_VER="1.6.1" HADOOP_VER="2.3" PROFILE="-Pspark-1.6 -Phadoop-2.3 -Ppyspark -Pscalding" BUILD_FLAG="package -Pbuild-distr" TEST_FLAG="verify -Pusing-packaged-distr" TEST_PROJECTS="" + env: SPARK_VER="1.6.1" HADOOP_VER="2.3" PROFILE="-Pspark-1.6 -Pr -Phadoop-2.3 -Ppyspark -Pscalding" BUILD_FLAG="package -Pbuild-distr" TEST_FLAG="verify -Pusing-packaged-distr" TEST_PROJECTS="" # Test spark module for 1.5.2 - jdk: "oraclejdk7" - env: SPARK_VER="1.5.2" HADOOP_VER="2.3" PROFILE="-Pspark-1.5 -Phadoop-2.3 -Ppyspark" BUILD_FLAG="package -DskipTests" TEST_FLAG="verify" TEST_PROJECTS="-pl zeppelin-interpreter,zeppelin-zengine,zeppelin-server,zeppelin-display,spark-dependencies,spark -Dtest=org.apache.zeppelin.rest.*Test,org.apache.zeppelin.spark* -DfailIfNoTests=false" + env: SPARK_VER="1.5.2" HADOOP_VER="2.3" PROFILE="-Pspark-1.5 -Pr -Phadoop-2.3 -Ppyspark" BUILD_FLAG="package -DskipTests" TEST_FLAG="verify" TEST_PROJECTS="-pl zeppelin-interpreter,zeppelin-zengine,zeppelin-server,zeppelin-display,spark-dependencies,spark,r -Dtest=org.apache.zeppelin.rest.*Test,org.apache.zeppelin.spark* -DfailIfNoTests=false" # Test spark module for 1.4.1 - jdk: "oraclejdk7" - env: SPARK_VER="1.4.1" HADOOP_VER="2.3" PROFILE="-Pspark-1.4 -Phadoop-2.3 -Ppyspark" BUILD_FLAG="package -DskipTests" TEST_FLAG="verify" TEST_PROJECTS="-pl zeppelin-interpreter,zeppelin-zengine,zeppelin-server,zeppelin-display,spark-dependencies,spark -Dtest=org.apache.zeppelin.rest.*Test,org.apache.zeppelin.spark* -DfailIfNoTests=false" + env: SPARK_VER="1.4.1" HADOOP_VER="2.3" PROFILE="-Pspark-1.4 -Pr -Phadoop-2.3 -Ppyspark" BUILD_FLAG="package -DskipTests" TEST_FLAG="verify" TEST_PROJECTS="-pl zeppelin-interpreter,zeppelin-zengine,zeppelin-server,zeppelin-display,spark-dependencies,spark,r -Dtest=org.apache.zeppelin.rest.*Test,org.apache.zeppelin.spark* -DfailIfNoTests=false" # Test spark module for 1.3.1 - jdk: "oraclejdk7" @@ -46,12 +47,24 @@ matrix: - jdk: "oraclejdk7" env: SPARK_VER="1.1.1" HADOOP_VER="2.3" PROFILE="-Pspark-1.1 -Phadoop-2.3 -Ppyspark" BUILD_FLAG="package -DskipTests" TEST_FLAG="verify" TEST_PROJECTS="-pl zeppelin-interpreter,zeppelin-zengine,zeppelin-server,zeppelin-display,spark-dependencies,spark -Dtest=org.apache.zeppelin.rest.*Test,org.apache.zeppelin.spark* -DfailIfNoTests=false" - # Test selenium with spark module for 1.6.0 + # Test selenium with spark module for 1.6.1 - jdk: "oraclejdk7" - env: TEST_SELENIUM="true" SPARK_VER="1.6.0" HADOOP_VER="2.3" PROFILE="-Pspark-1.6 -Phadoop-2.3 -Ppyspark" BUILD_FLAG="package -DskipTests" TEST_FLAG="verify" TEST_PROJECTS="-pl zeppelin-interpreter,zeppelin-zengine,zeppelin-server,zeppelin-display,spark-dependencies,spark -Dtest=org.apache.zeppelin.AbstractFunctionalSuite -DfailIfNoTests=false" + env: TEST_SELENIUM="true" SPARK_VER="1.6.1" HADOOP_VER="2.3" PROFILE="-Pspark-1.6 -Phadoop-2.3 -Ppyspark" BUILD_FLAG="package -DskipTests" TEST_FLAG="verify" TEST_PROJECTS="-pl zeppelin-interpreter,zeppelin-zengine,zeppelin-server,zeppelin-display,spark-dependencies,spark -Dtest=org.apache.zeppelin.AbstractFunctionalSuite -DfailIfNoTests=false" + +addons: + apt: + sources: + - r-packages-precise + packages: + - r-base-dev + - r-cran-evaluate + - r-cran-base64enc before_install: - "ls -la .spark-dist" + - mkdir -p ~/R + - R -e "install.packages('knitr', repos = 'http://cran.us.r-project.org', lib='~/R')" + - export R_LIBS='~/R' - "export DISPLAY=:99.0" - "sh -e /etc/init.d/xvfb start" http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/d5e87fb8/LICENSE ---------------------------------------------------------------------- diff --git a/LICENSE b/LICENSE index f609422..db076c6 100644 --- a/LICENSE +++ b/LICENSE @@ -244,4 +244,16 @@ Apache licenses The following components are provided under the Apache License. See project link for details. The text of each license is also included at licenses/LICENSE-[project]-[version].txt. - (Apache 2.0) Bootstrap v3.0.2 (http://getbootstrap.com/) - https://github.com/twbs/bootstrap/blob/v3.0.2/LICENSE \ No newline at end of file + (Apache 2.0) Bootstrap v3.0.2 (http://getbootstrap.com/) - https://github.com/twbs/bootstrap/blob/v3.0.2/LICENSE + +======================================================================== +BSD 3-Clause licenses +======================================================================== +The following components are provided under the BSD 3-Clause license. See file headers and project links for details. + + (BSD 3 Clause) portions of rscala 1.0.6 (https://dahl.byu.edu/software/rscala/) - https://cran.r-project.org/web/packages/rscala/index.html + r/R/rzeppelin/R/{common.R, globals.R,protocol.R,rServer.R,scalaInterpreter.R,zzz.R } + r/src/main/scala/org/apache/zeppelin/rinterpreter/rscala/{Package.scala, RClient.scala} + + (BSD 3 Clause) portions of Scala (http://www.scala-lang.org/download) - http://www.scala-lang.org/download/#License + r/src/main/scala/scala/Console.scala \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/d5e87fb8/bin/interpreter.sh ---------------------------------------------------------------------- diff --git a/bin/interpreter.sh b/bin/interpreter.sh index 69c94f6..17c9028 100755 --- a/bin/interpreter.sh +++ b/bin/interpreter.sh @@ -85,7 +85,10 @@ if [[ "${INTERPRETER_ID}" == "spark" ]]; then export SPARK_SUBMIT="${SPARK_HOME}/bin/spark-submit" SPARK_APP_JAR="$(ls ${ZEPPELIN_HOME}/interpreter/spark/zeppelin-spark*.jar)" # This will evantually passes SPARK_APP_JAR to classpath of SparkIMain - ZEPPELIN_CLASSPATH+=${SPARK_APP_JAR} + ZEPPELIN_CLASSPATH=${SPARK_APP_JAR} + # Need to add the R Interpreter + RZEPPELINPATH="$(ls ${ZEPPELIN_HOME}/interpreter/spark/zeppelin-zr*.jar)" + ZEPPELIN_CLASSPATH="${ZEPPELIN_CLASSPATH}:${RZEPPELINPATH}" pattern="$SPARK_HOME/python/lib/py4j-*-src.zip" py4j=($pattern) @@ -130,6 +133,8 @@ if [[ "${INTERPRETER_ID}" == "spark" ]]; then ZEPPELIN_CLASSPATH+=":${HADOOP_CONF_DIR}" fi + RZEPPELINPATH="$(ls ${ZEPPELIN_HOME}/interpreter/spark/zeppelin-zr*.jar)" + ZEPPELIN_CLASSPATH="${ZEPPELIN_CLASSPATH}:${RZEPPELINPATH}" export SPARK_CLASSPATH+=":${ZEPPELIN_CLASSPATH}" fi elif [[ "${INTERPRETER_ID}" == "hbase" ]]; then http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/d5e87fb8/conf/zeppelin-site.xml.template ---------------------------------------------------------------------- diff --git a/conf/zeppelin-site.xml.template b/conf/zeppelin-site.xml.template index 93d0495..f475f87 100755 --- a/conf/zeppelin-site.xml.template +++ b/conf/zeppelin-site.xml.template @@ -144,7 +144,7 @@ zeppelin.interpreters - org.apache.zeppelin.spark.SparkInterpreter,org.apache.zeppelin.spark.PySparkInterpreter,org.apache.zeppelin.spark.SparkSqlInterpreter,org.apache.zeppelin.spark.DepInterpreter,org.apache.zeppelin.markdown.Markdown,org.apache.zeppelin.angular.AngularInterpreter,org.apache.zeppelin.shell.ShellInterpreter,org.apache.zeppelin.hive.HiveInterpreter,org.apache.zeppelin.tajo.TajoInterpreter,org.apache.zeppelin.file.HDFSFileInterpreter,org.apache.zeppelin.flink.FlinkInterpreter,org.apache.zeppelin.lens.LensInterpreter,org.apache.zeppelin.ignite.IgniteInterpreter,org.apache.zeppelin.ignite.IgniteSqlInterpreter,org.apache.zeppelin.cassandra.CassandraInterpreter,org.apache.zeppelin.geode.GeodeOqlInterpreter,org.apache.zeppelin.postgresql.PostgreSqlInterpreter,org.apache.zeppelin.jdbc.JDBCInterpreter,org.apache.zeppelin.phoenix.PhoenixInterpreter,org.apache.zeppelin.kylin.KylinInterpreter,org.apache.zeppelin.elasticsearch.ElasticsearchInterpreter,org.apache.zeppelin.scalding.ScaldingInte rpreter,org.apache.zeppelin.alluxio.AlluxioInterpreter,org.apache.zeppelin.hbase.HbaseInterpreter + org.apache.zeppelin.spark.SparkInterpreter,org.apache.zeppelin.spark.PySparkInterpreter,org.apache.zeppelin.spark.SparkSqlInterpreter,org.apache.zeppelin.spark.DepInterpreter,org.apache.zeppelin.markdown.Markdown,org.apache.zeppelin.angular.AngularInterpreter,org.apache.zeppelin.shell.ShellInterpreter,org.apache.zeppelin.hive.HiveInterpreter,org.apache.zeppelin.tajo.TajoInterpreter,org.apache.zeppelin.file.HDFSFileInterpreter,org.apache.zeppelin.flink.FlinkInterpreter,org.apache.zeppelin.lens.LensInterpreter,org.apache.zeppelin.ignite.IgniteInterpreter,org.apache.zeppelin.ignite.IgniteSqlInterpreter,org.apache.zeppelin.cassandra.CassandraInterpreter,org.apache.zeppelin.geode.GeodeOqlInterpreter,org.apache.zeppelin.postgresql.PostgreSqlInterpreter,org.apache.zeppelin.jdbc.JDBCInterpreter,org.apache.zeppelin.phoenix.PhoenixInterpreter,org.apache.zeppelin.kylin.KylinInterpreter,org.apache.zeppelin.elasticsearch.ElasticsearchInterpreter,org.apache.zeppelin.scalding.ScaldingInte rpreter,org.apache.zeppelin.alluxio.AlluxioInterpreter,org.apache.zeppelin.hbase.HbaseInterpreter,org.apache.zeppelin.rinterpreter.KnitR,org.apache.zeppelin.rinterpreter.RRepl Comma separated interpreter configurations. First interpreter become a default http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/d5e87fb8/docs/interpreter/r.md ---------------------------------------------------------------------- diff --git a/docs/interpreter/r.md b/docs/interpreter/r.md new file mode 100644 index 0000000..9b893ad --- /dev/null +++ b/docs/interpreter/r.md @@ -0,0 +1,100 @@ +--- +layout: page +title: "R Interpreter" +description: "" +group: manual +--- +{% include JB/setup %} + +## R Interpreter + +This is a the Apache (incubating) Zeppelin project, with the addition of support for the R programming language and R-spark integration. + +### Requirements + +Additional requirements for the R interpreter are: + + * R 3.1 or later (earlier versions may work, but have not been tested) + * The `evaluate` R package. + +For full R support, you will also need the following R packages: + + * `knitr` + * `repr` -- available with `devtools::install_github("IRkernel/repr")` + * `htmltools` -- required for some interactive plotting + * `base64enc` -- required to view R base plots + +### Configuration + +To run Zeppelin with the R Interpreter, the SPARK_HOME environment variable must be set. The best way to do this is by editing `conf/zeppelin-env.sh`. + +If it is not set, the R Interpreter will not be able to interface with Spark. + +You should also copy `conf/zeppelin-site.xml.template` to `conf/zeppelin-site.xml`. That will ensure that Zeppelin sees the R Interpreter the first time it starts up. + +### Using the R Interpreter + +By default, the R Interpreter appears as two Zeppelin Interpreters, `%r` and `%knitr`. + +`%r` will behave like an ordinary REPL. You can execute commands as in the CLI. + +[![2+2](screenshots/repl2plus2.png)](screenshots/repl2plus2.png) + +R base plotting is fully supported + +[![replhist](screenshots/replhist.png)](screenshots/replhist.png) + +If you return a data.frame, Zeppelin will attempt to display it using Zeppelin's built-in visualizations. + +[![replhist](screenshots/replhead.png)](screenshots/replhead.png) + +`%knitr` interfaces directly against `knitr`, with chunk options on the first line: + +[![knitgeo](screenshots/knitgeo.png)](screenshots/knitgeo.png) +[![knitstock](screenshots/knitstock.png)](screenshots/knitstock.png) +[![knitmotion](screenshots/knitmotion.png)](screenshots/knitmotion.png) + +The two interpreters share the same environment. If you define a variable from `%r`, it will be within-scope if you then make a call using `knitr`. + +### Using SparkR & Moving Between Languages + +If `SPARK_HOME` is set, the `SparkR` package will be loaded automatically: + +[![sparkrfaithful](screenshots/sparkrfaithful.png)](screenshots/sparkrfaithful.png) + +The Spark Context and SQL Context are created and injected into the local environment automatically as `sc` and `sql`. + +The same context are shared with the `%spark`, `%sql` and `%pyspark` interpreters: + +[![backtoscala](screenshots/backtoscala.png)](screenshots/backtoscala.png) + +You can also make an ordinary R variable accessible in scala and Python: + +[![varr1](screenshots/varr1.png)](screenshots/varr1.png) + +And vice versa: + +[![varscala](screenshots/varscala.png)](screenshots/varscala.png) +[![varr2](screenshots/varr2.png)](screenshots/varr2.png) + +### Caveats & Troubleshooting + +* Almost all issues with the R interpreter turned out to be caused by an incorrectly set `SPARK_HOME`. The R interpreter must load a version of the `SparkR` package that matches the running version of Spark, and it does this by searching `SPARK_HOME`. If Zeppelin isn't configured to interface with Spark in `SPARK_HOME`, the R interpreter will not be able to connect to Spark. + +* The `knitr` environment is persistent. If you run a chunk from Zeppelin that changes a variable, then run the same chunk again, the variable has already been changed. Use immutable variables. + +* (Note that `%spark.r` and `$r` are two different ways of calling the same interpreter, as are `%spark.knitr` and `%knitr`. By default, Zeppelin puts the R interpreters in the `%spark.` Interpreter Group. + +* Using the `%r` interpreter, if you return a data.frame, HTML, or an image, it will dominate the result. So if you execute three commands, and one is `hist()`, all you will see is the histogram, not the results of the other commands. This is a Zeppelin limitation. + +* If you return a data.frame (for instance, from calling `head()`) from the `%spark.r` interpreter, it will be parsed by Zeppelin's built-in data visualization system. + +* Why `knitr` Instead of `rmarkdown`? Why no `htmlwidgets`? In order to support `htmlwidgets`, which has indirect dependencies, `rmarkdown` uses `pandoc`, which requires writing to and reading from disc. This makes it many times slower than `knitr`, which can operate entirely in RAM. + +* Why no `ggvis` or `shiny`? Supporting `shiny` would require integrating a reverse-proxy into Zeppelin, which is a task. + +* Max OS X & case-insensitive filesystem. If you try to install on a case-insensitive filesystem, which is the Mac OS X default, maven can unintentionally delete the install directory because `r` and `R` become the same subdirectory. + +* Error `unable to start device X11` with the repl interpreter. Check your shell login scripts to see if they are adjusting the `DISPLAY` environment variable. This is common on some operating systems as a workaround for ssh issues, but can interfere with R plotting. + +* akka Library Version or `TTransport` errors. This can happen if you try to run Zeppelin with a SPARK_HOME that has a version of Spark other than the one specified with `-Pspark-1.x` when Zeppelin was compiled. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/d5e87fb8/docs/interpreter/screenshots/backtoscala.png ---------------------------------------------------------------------- diff --git a/docs/interpreter/screenshots/backtoscala.png b/docs/interpreter/screenshots/backtoscala.png new file mode 100644 index 0000000..c0c897a Binary files /dev/null and b/docs/interpreter/screenshots/backtoscala.png differ http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/d5e87fb8/docs/interpreter/screenshots/knitgeo.png ---------------------------------------------------------------------- diff --git a/docs/interpreter/screenshots/knitgeo.png b/docs/interpreter/screenshots/knitgeo.png new file mode 100644 index 0000000..d1eb0d8 Binary files /dev/null and b/docs/interpreter/screenshots/knitgeo.png differ http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/d5e87fb8/docs/interpreter/screenshots/knitmotion.png ---------------------------------------------------------------------- diff --git a/docs/interpreter/screenshots/knitmotion.png b/docs/interpreter/screenshots/knitmotion.png new file mode 100644 index 0000000..a1048ea Binary files /dev/null and b/docs/interpreter/screenshots/knitmotion.png differ http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/d5e87fb8/docs/interpreter/screenshots/knitstock.png ---------------------------------------------------------------------- diff --git a/docs/interpreter/screenshots/knitstock.png b/docs/interpreter/screenshots/knitstock.png new file mode 100644 index 0000000..7a27c60 Binary files /dev/null and b/docs/interpreter/screenshots/knitstock.png differ http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/d5e87fb8/docs/interpreter/screenshots/repl2plus2.png ---------------------------------------------------------------------- diff --git a/docs/interpreter/screenshots/repl2plus2.png b/docs/interpreter/screenshots/repl2plus2.png new file mode 100644 index 0000000..8f70092 Binary files /dev/null and b/docs/interpreter/screenshots/repl2plus2.png differ http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/d5e87fb8/docs/interpreter/screenshots/replhead.png ---------------------------------------------------------------------- diff --git a/docs/interpreter/screenshots/replhead.png b/docs/interpreter/screenshots/replhead.png new file mode 100644 index 0000000..b09ccab Binary files /dev/null and b/docs/interpreter/screenshots/replhead.png differ http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/d5e87fb8/docs/interpreter/screenshots/replhist.png ---------------------------------------------------------------------- diff --git a/docs/interpreter/screenshots/replhist.png b/docs/interpreter/screenshots/replhist.png new file mode 100644 index 0000000..5291404 Binary files /dev/null and b/docs/interpreter/screenshots/replhist.png differ http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/d5e87fb8/docs/interpreter/screenshots/sparkrfaithful.png ---------------------------------------------------------------------- diff --git a/docs/interpreter/screenshots/sparkrfaithful.png b/docs/interpreter/screenshots/sparkrfaithful.png new file mode 100644 index 0000000..ec956c7 Binary files /dev/null and b/docs/interpreter/screenshots/sparkrfaithful.png differ http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/d5e87fb8/docs/interpreter/screenshots/varr1.png ---------------------------------------------------------------------- diff --git a/docs/interpreter/screenshots/varr1.png b/docs/interpreter/screenshots/varr1.png new file mode 100644 index 0000000..ac997a8 Binary files /dev/null and b/docs/interpreter/screenshots/varr1.png differ http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/d5e87fb8/docs/interpreter/screenshots/varr2.png ---------------------------------------------------------------------- diff --git a/docs/interpreter/screenshots/varr2.png b/docs/interpreter/screenshots/varr2.png new file mode 100644 index 0000000..b49988d Binary files /dev/null and b/docs/interpreter/screenshots/varr2.png differ http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/d5e87fb8/docs/interpreter/screenshots/varscala.png ---------------------------------------------------------------------- diff --git a/docs/interpreter/screenshots/varscala.png b/docs/interpreter/screenshots/varscala.png new file mode 100644 index 0000000..7f95ad2 Binary files /dev/null and b/docs/interpreter/screenshots/varscala.png differ http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/d5e87fb8/licenses/LICENSE-rscala-1.0.6 ---------------------------------------------------------------------- diff --git a/licenses/LICENSE-rscala-1.0.6 b/licenses/LICENSE-rscala-1.0.6 new file mode 100644 index 0000000..e0d577f --- /dev/null +++ b/licenses/LICENSE-rscala-1.0.6 @@ -0,0 +1,29 @@ +Copyright (c) 2013-2015, David B. Dahl, Brigham Young University + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are +met: + + Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + + Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + + Neither the name of the nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/d5e87fb8/licenses/LICENSE-scala-2.10 ---------------------------------------------------------------------- diff --git a/licenses/LICENSE-scala-2.10 b/licenses/LICENSE-scala-2.10 new file mode 100644 index 0000000..90d7530 --- /dev/null +++ b/licenses/LICENSE-scala-2.10 @@ -0,0 +1,11 @@ +Copyright (c) 2002-2016 EPFL +Copyright (c) 2011-2016 Lightbend, Inc. (formerly Typesafe, Inc.) + +All rights reserved. + +Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: + +Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. +Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. +Neither the name of the EPFL nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/d5e87fb8/pom.xml ---------------------------------------------------------------------- diff --git a/pom.xml b/pom.xml index e5f7d9a..76f4a31 100755 --- a/pom.xml +++ b/pom.xml @@ -303,7 +303,6 @@ org/apache/zeppelin/interpreter/thrift/* - @@ -460,6 +459,7 @@ .github/* .gitignore .repository/ + .Rhistory **/*.diff **/*.patch **/*.avsc @@ -513,6 +513,7 @@ docs/Rakefile docs/rss.xml docs/sitemap.txt + **/dependency-reduced-pom.xml docs/assets/themes/zeppelin/css/syntax.css @@ -520,6 +521,23 @@ docs/_site/** docs/Gemfile.lock + + + R/lib/** + + + r/R/rzeppelin/R/globals.R + r/R/rzeppelin/R/common.R + r/R/rzeppelin/R/protocol.R + r/R/rzeppelin/R/rServer.R + r/R/rzeppelin/R/scalaInterpreter.R + r/R/rzeppelin/R/zzz.R + r/src/main/scala/scala/Console.scala + r/src/main/scala/org/apache/zeppelin/rinterpreter/rscala/Package.scala + r/src/main/scala/org/apache/zeppelin/rinterpreter/rscala/RClient.scala + + r/R/rzeppelin/DESCRIPTION + r/R/rzeppelin/NAMESPACE @@ -675,6 +693,13 @@ + r + + r + + + + scalding scalding http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/d5e87fb8/r/R/install-dev.sh ---------------------------------------------------------------------- diff --git a/r/R/install-dev.sh b/r/R/install-dev.sh new file mode 100755 index 0000000..a3b5224 --- /dev/null +++ b/r/R/install-dev.sh @@ -0,0 +1,41 @@ +#!/bin/bash +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# This scripts packages R files to create a package that can be loaded into R, +# and also installs necessary packages. + + +set -o pipefail +set -e +set -x + +FWDIR="$(cd `dirname $0`; pwd)" +LIB_DIR="$FWDIR/../../R/lib" + +mkdir -p $LIB_DIR + +pushd $FWDIR > /dev/null + +# Generate Rd files if devtools is installed +#Rscript -e ' if("devtools" %in% rownames(installed.packages())) { library(devtools); devtools::document(pkg="./pkg", roclets=c("rd")) }' + +# Install SparkR to $LIB_DIR +R CMD INSTALL --library=$LIB_DIR $FWDIR/rzeppelin/ + +popd > /dev/null +set +x \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/d5e87fb8/r/R/rzeppelin/DESCRIPTION ---------------------------------------------------------------------- diff --git a/r/R/rzeppelin/DESCRIPTION b/r/R/rzeppelin/DESCRIPTION new file mode 100644 index 0000000..b34f5cc --- /dev/null +++ b/r/R/rzeppelin/DESCRIPTION @@ -0,0 +1,28 @@ +Package: rzeppelin +Type: Package +Title: Interface from scala to R, based on rscala, for the Apache (Incubation) Zeppelin project +Version: 0.1.0 +Date: 2015-12-01 +Authors@R: c(person(given="David B.",family="Dahl",role=c("aut","cre"),email="dahl@stat.byu.edu"), + person(family="Scala developers",role="ctb",comment="see http://scala-lang.org/")) +URL: http://dahl.byu.edu/software/rscala/ +Imports: utils, + evaluate +Suggests: + goolgeVis, + htmltools, + knitr, + rCharts, + repr, + SparkR, + base64enc +SystemRequirements: Scala (>= 2.10) +Description: +License: file LICENSE +NeedsCompilation: no +Packaged: 2015-05-15 13:36:01 UTC; dahl +Author: David B. Dahl [aut, cre], + Scala developers [ctb] (see http://scala-lang.org/) +Maintainer: Amos B. Elberg +Repository: +Date/Publication: 2015-12-01 21:50:02 http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/d5e87fb8/r/R/rzeppelin/LICENSE ---------------------------------------------------------------------- diff --git a/r/R/rzeppelin/LICENSE b/r/R/rzeppelin/LICENSE new file mode 100644 index 0000000..0ed96c4 --- /dev/null +++ b/r/R/rzeppelin/LICENSE @@ -0,0 +1,14 @@ +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to You under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/d5e87fb8/r/R/rzeppelin/NAMESPACE ---------------------------------------------------------------------- diff --git a/r/R/rzeppelin/NAMESPACE b/r/R/rzeppelin/NAMESPACE new file mode 100644 index 0000000..8afdfe6 --- /dev/null +++ b/r/R/rzeppelin/NAMESPACE @@ -0,0 +1,7 @@ +import(utils) + +export("rzeppelinPackage") +export("progress_zeppelin") +export(.z.put) +export(.z.get) +export(.z.input) \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/d5e87fb8/r/R/rzeppelin/R/common.R ---------------------------------------------------------------------- diff --git a/r/R/rzeppelin/R/common.R b/r/R/rzeppelin/R/common.R new file mode 100644 index 0000000..a52e22e --- /dev/null +++ b/r/R/rzeppelin/R/common.R @@ -0,0 +1,14 @@ +strintrplt <- function(snippet,envir=parent.frame()) { + if ( ! is.character(snippet) ) stop("Character vector expected.") + if ( length(snippet) != 1 ) stop("Length of vector must be exactly one.") + m <- regexpr("@\\{([^\\}]+)\\}",snippet) + if ( m != -1 ) { + s1 <- substr(snippet,1,m-1) + s2 <- substr(snippet,m+2,m+attr(m,"match.length")-2) + s3 <- substr(snippet,m+attr(m,"match.length"),nchar(snippet)) + strintrplt(paste(s1,paste(toString(eval(parse(text=s2),envir=envir)),collapse=" ",sep=""),s3,sep=""),envir) + } else snippet +} + + + http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/d5e87fb8/r/R/rzeppelin/R/globals.R ---------------------------------------------------------------------- diff --git a/r/R/rzeppelin/R/globals.R b/r/R/rzeppelin/R/globals.R new file mode 100644 index 0000000..17b59aa --- /dev/null +++ b/r/R/rzeppelin/R/globals.R @@ -0,0 +1,3 @@ +lEtTeRs <- c(letters,LETTERS) +alphabet <- c(lEtTeRs,0:9) + http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/d5e87fb8/r/R/rzeppelin/R/protocol.R ---------------------------------------------------------------------- diff --git a/r/R/rzeppelin/R/protocol.R b/r/R/rzeppelin/R/protocol.R new file mode 100644 index 0000000..0fe07e2 --- /dev/null +++ b/r/R/rzeppelin/R/protocol.R @@ -0,0 +1,35 @@ +UNSUPPORTED_TYPE <- 0L +INTEGER <- 1L +DOUBLE <- 2L +BOOLEAN <- 3L +STRING <- 4L +DATE <- 5L +DATETIME <- 6L +UNSUPPORTED_STRUCTURE <- 10L +NULLTYPE <- 11L +REFERENCE <- 12L +ATOMIC <- 13L +VECTOR <- 14L +MATRIX <- 15L +LIST <- 16L +DATAFRAME <- 17L +S3CLASS <- 18L +S4CLASS <- 19L +JOBJ <- 20L +EXIT <- 100L +RESET <- 101L +GC <- 102L +DEBUG <- 103L +EVAL <- 104L +SET <- 105L +SET_SINGLE <- 106L +SET_DOUBLE <- 107L +GET <- 108L +GET_REFERENCE <- 109L +DEF <- 110L +INVOKE <- 111L +SCALAP <- 112L +OK <- 1000L +ERROR <- 1001L +UNDEFINED_IDENTIFIER <- 1002L +CURRENT_SUPPORTED_SCALA_VERSION <- "2.10" http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/d5e87fb8/r/R/rzeppelin/R/rServer.R ---------------------------------------------------------------------- diff --git a/r/R/rzeppelin/R/rServer.R b/r/R/rzeppelin/R/rServer.R new file mode 100644 index 0000000..af74d7d --- /dev/null +++ b/r/R/rzeppelin/R/rServer.R @@ -0,0 +1,214 @@ +rServe <- function(sockets) { + cc(sockets) + workspace <- sockets[['workspace']] + debug <- get("debug",envir=sockets[['env']]) + while ( TRUE ) { + if ( debug ) cat("R DEBUG: Top of the loop waiting for a command.\n") + cmd <- rb(sockets,integer(0)) + if ( cmd == EXIT ) { + if ( debug ) cat("R DEBUG: Got EXIT\n") + return() + } else if ( cmd == DEBUG ) { + if ( debug ) cat("R DEBUG: Got DEBUG\n") + newDebug <- ( rb(sockets,integer(0)) != 0 ) + if ( debug != newDebug ) cat("R DEBUG: Debugging is now ",newDebug,"\n",sep="") + debug <- newDebug + assign("debug",debug,envir=sockets[['env']]) + } else if ( cmd == EVAL ) { + if ( debug ) cat("R DEBUG: Got EVAL\n") + snippet <- rc(sockets) + output <- capture.output(result <- try(eval(parse(text=snippet),envir=workspace))) + if ( inherits(result,"try-error") ) { + wb(sockets,ERROR) + msg <- paste(c(output,attr(result,"condition")$message),collapse="\n") + wc(sockets,msg) + } else { + wb(sockets,OK) + output <- paste(output,collapse="\n") + wc(sockets,output) + } + assign(".rzeppelin.last.value",result,envir=workspace) + } else if ( cmd %in% c(SET,SET_SINGLE,SET_DOUBLE) ) { + if ( debug ) cat("R DEBUG: Got SET\n") + if ( cmd != SET ) index <- rc(sockets) + identifier <- rc(sockets) + dataStructure <- rb(sockets,integer(0)) + if ( dataStructure == NULLTYPE ) { + if ( cmd == SET ) assign(identifier,NULL,envir=workspace) + else subassign(sockets,identifier,index,NULL,cmd==SET_SINGLE) + } else if ( dataStructure == ATOMIC ) { + dataType <- rb(sockets,integer(0)) + if ( dataType == INTEGER ) value <- rb(sockets,integer(0)) + else if ( dataType == DOUBLE ) value <- rb(sockets,double(0)) + else if ( dataType == BOOLEAN ) value <- rb(sockets,integer(0)) != 0 + else if ( dataType == STRING ) value <- rc(sockets) +# else if (dataType == DATE) value <- as.Date(rb(sockets,integer(0)), origin=as.Date("1970-01-01")) + else stop(paste("Unknown data type:",dataType)) + if ( cmd == SET ) assign(identifier,value,envir=workspace) + else subassign(sockets,identifier,index,value,cmd==SET_SINGLE) + } else if ( dataStructure == VECTOR ) { + dataLength <- rb(sockets,integer(0)) + dataType <- rb(sockets,integer(0)) + if ( dataType == INTEGER ) value <- rb(sockets,integer(0),n=dataLength) + else if ( dataType == DOUBLE ) value <- rb(sockets,double(0),n=dataLength) + else if ( dataType == BOOLEAN ) value <- rb(sockets,integer(0),n=dataLength) != 0 + else if ( dataType == STRING ) value <- sapply(1:dataLength,function(i) rc(sockets)) +# else if ( dateType == DATE ) value <- as.Date(rb(sockets,integer(0), n = dataLength), origin=as.Date("1970-01-01")) + else stop(paste("Unknown data type:",dataType)) + if ( cmd == SET ) assign(identifier,value,envir=workspace) + else subassign(sockets,identifier,index,value,cmd==SET_SINGLE) + } else if ( dataStructure == MATRIX ) { + dataNRow <- rb(sockets,integer(0)) + dataNCol <- rb(sockets,integer(0)) + dataLength <- dataNRow * dataNCol + dataType <- rb(sockets,integer(0)) + if ( dataType == INTEGER ) value <- matrix(rb(sockets,integer(0),n=dataLength),nrow=dataNRow,byrow=TRUE) + else if ( dataType == DOUBLE ) value <- matrix(rb(sockets,double(0),n=dataLength),nrow=dataNRow,byrow=TRUE) + else if ( dataType == BOOLEAN ) value <- matrix(rb(sockets,integer(0),n=dataLength),nrow=dataNRow,byrow=TRUE) != 0 + else if ( dataType == STRING ) value <- matrix(sapply(1:dataLength,function(i) rc(sockets)),nrow=dataNRow,byrow=TRUE) +# else if ( dateType == DATE) value <- matrix(as.Date(rb(sockets,integer(0),n=dataLength), + # origin = as.Date("1970-01-01")),nrow=dataNRow,byrow=TRUE) + else stop(paste("Unknown data type:",dataType)) + if ( cmd == SET ) assign(identifier,value,envir=workspace) + else subassign(sockets,identifier,index,value,cmd==SET_SINGLE) + } else if ( dataStructure == REFERENCE ) { + otherIdentifier <- rc(sockets) + if ( exists(otherIdentifier,envir=workspace$.) ) { + wb(sockets,OK) + value <- get(otherIdentifier,envir=workspace$.) + if ( cmd == SET ) assign(identifier,value,envir=workspace) + else subassign(sockets,identifier,index,value,cmd==SET_SINGLE) + } else { + wb(sockets,UNDEFINED_IDENTIFIER) + } + } else stop(paste("Unknown data structure:",dataStructure)) + } else if ( cmd == GET ) { + if ( debug ) cat("R DEBUG: Got GET\n") + identifier <- rc(sockets) + value <- tryCatch(get(identifier,envir=workspace),error=function(e) e) + if ( is.null(value) ) { + wb(sockets,NULLTYPE) + } else if ( inherits(value,"error") ) { + wb(sockets,UNDEFINED_IDENTIFIER) + } else if ( ! is.atomic(value) ) { + # This is where code for lists, data.frames, S3, and S4 classes must go + wb(sockets,UNSUPPORTED_STRUCTURE) + } else if ( is.vector(value) ) { + type <- checkType(value) + if ( ( length(value) == 1 ) && ( ! get("length.one.as.vector",envir=sockets[['env']]) ) ) { + wb(sockets,ATOMIC) + } else { + wb(sockets,VECTOR) + wb(sockets,length(value)) + } + wb(sockets,type) + if ( type == STRING ) { + if ( length(value) > 0 ) for ( i in 1:length(value) ) wc(sockets,value[i]) + } else { + if ( type == BOOLEAN ) wb(sockets,as.integer(value)) +# else if (type == DATE) wb(sockets,as.integer(value)) + else wb(sockets,value) + } + } else if ( is.matrix(value) ) { + type <- checkType(value) + wb(sockets,MATRIX) + wb(sockets,dim(value)) + wb(sockets,type) + if ( nrow(value) > 0 ) for ( i in 1:nrow(value) ) { + if ( type == STRING ) { + if ( ncol(value) > 0 ) for ( j in 1:ncol(value) ) wc(sockets,value[i,j]) + } + else if ( type == BOOLEAN ) wb(sockets,as.integer(value[i,])) +# else if (type == DATE) wb(sockets, as.integer(value[i,])) + else wb(sockets,value[i,]) + } + } else { + wb(sockets,UNSUPPORTED_STRUCTURE) + } + } else if ( cmd == GET_REFERENCE ) { + if ( debug ) cat("R DEBUG: Got GET_REFERENCE\n") + identifier <- rc(sockets) + value <- tryCatch(get(identifier,envir=workspace),error=function(e) e) + if ( inherits(value,"error") ) { + wb(sockets,UNDEFINED_IDENTIFIER) + } else { + wb(sockets,REFERENCE) + wc(sockets,new.reference(value,workspace$.)) + } + } else if ( cmd == GC ) { + if ( debug ) cat("R DEBUG: Got GC\n") + workspace$. <- new.env(parent=workspace) + } else stop(paste("Unknown command:",cmd)) + flush(sockets[['socketIn']]) + } +} + +subassign <- function(sockets,x,i,value,single=TRUE) { + workspace <- sockets[['workspace']] + assign(".rzeppelin.set.value",value,envir=workspace) + brackets <- if ( single ) c("[","]") else c("[[","]]") + output <- capture.output(result <- try(eval(parse(text=paste0(x,brackets[1],i,brackets[2]," <- .rzeppelin.set.value")),envir=workspace))) + if ( inherits(result,"try-error") ) { + wb(sockets,ERROR) + output <- paste(paste(output,collapse="\n"),paste(attr(result,"condition")$message,collapse="\n"),sep="\n") + wc(sockets,output) + } else { + wb(sockets,OK) + } + rm(".reppelin.set.value",envir=workspace) + invisible(value) +} + +new.reference <- function(value,envir) { + name <- "" + while ( ( name == "" ) || ( exists(name,envir=envir) ) ) { + name <- paste0(sample(lEtTeRs,1),paste0(sample(alphabet,7,replace=TRUE),collapse="")) + } + assign(name,value,envir=envir) + name +} + +newSockets <- function (portsFilename, debug, timeout) +{ + getPortNumbers <- function() { + delay <- 0.1 + start <- proc.time()[3] + while (TRUE) { + if ((proc.time()[3] - start) > timeout) + stop("Timed out waiting for Scala to start.") + Sys.sleep(delay) + delay <- 1 * delay + if (file.exists(portsFilename)) { + line <- scan(portsFilename, n = 2, what = character(0), + quiet = TRUE) + if (length(line) > 0) + return(as.numeric(line)) + } + } + } + ports <- getPortNumbers() + file.remove(portsFilename) + if (debug) + cat("R DEBUG: Trying to connect to port:", paste(ports, + collapse = ","), "\n") + socketConnectionIn <- socketConnection(port = ports[1], blocking = TRUE, + open = "ab", timeout = 2678400) + socketConnectionOut <- socketConnection(port = ports[2], + blocking = TRUE, open = "rb", timeout = 2678400) + functionCache <- new.env() + env <- new.env() + assign("open", TRUE, envir = env) + assign("debug", debug, envir = env) + assign("length.one.as.vector", FALSE, envir = env) + workspace <- new.env() + workspace$. <- new.env(parent = workspace) + result <- list(socketIn = socketConnectionIn, socketOut = socketConnectionOut, + env = env, workspace = workspace, functionCache = functionCache) + class(result) <- "ScalaInterpreter" + status <- rb(result, integer(0)) + if ((length(status) == 0) || (status != OK)) + stop("Error instantiating interpreter.") + wc(result, toString(packageVersion("rzeppelin"))) + flush(result[["socketIn"]]) + result +} http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/d5e87fb8/r/R/rzeppelin/R/rzeppelin.R ---------------------------------------------------------------------- diff --git a/r/R/rzeppelin/R/rzeppelin.R b/r/R/rzeppelin/R/rzeppelin.R new file mode 100644 index 0000000..c033efb --- /dev/null +++ b/r/R/rzeppelin/R/rzeppelin.R @@ -0,0 +1,95 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +.zeppenv <- new.env() + +.z.ohandler = evaluate:::new_output_handler( + value = function(x) { + if (is.data.frame(x)) return(x) + if ("html" %in% class(x)) return(x) + if (require("htmltools") & require("knitr")) { + if ("htmlwidget" %in% class(x)) { + return(.z.show.htmlwidget(x)) + } + } + if (isS4(x)) show(x) + else { + if (require("repr")) { + return(repr:::repr(x)) + } else return(x) + } + } +) + +# wrapper for evaluate +.z.valuate <- function(input) evaluate:::evaluate( + input = input, + envir =.zeppenv, + debug = FALSE, + output_handler =.z.ohandler, + stop_on_error = 0 +) + +# converts data.tables to the format needed for display in zeppelin + +.z.table <- function(i) { + + .zdfoutcon <- textConnection(".zdfout", open="w") + write.table(i, + col.names=TRUE, row.names=FALSE, sep="\t", + eol="\n", quote = FALSE, file = .zdfoutcon) + close(.zdfoutcon) + rm(.zdfoutcon) + .zdfout +} + +.z.completion <- function(buf, cursor) { + utils:::.assignLinebuffer(buf) + utils:::.assignEnd(cursor) + utils:::.guessTokenFromLine() + utils:::.completeToken() + utils:::.retrieveCompletions() +} + +.z.setProgress <- function(progress) SparkR:::callJMethod(.rContext, "setProgress", progress %% 100) +.z.incrementProgress <- function(increment = 1) SparkR:::callJMethod(.rContext, "incrementProgress", increment) + +.z.input <- function(name) SparkR:::callJMethod(.zeppelinContext, "input", name) + +.z.get <- function(name) { + isRDD <- SparkR:::callJStatic("org.apache.zeppelin.rinterpreter.RStatics", "testRDD", name) + obj <- SparkR:::callJStatic("org.apache.zeppelin.rinterpreter.RStatics", "getZ", name) + if (isRDD) SparkR:::RDD(obj) + else obj + } + +.z.put <- function(name, object) { + if ("RDD" %in% class(object)) object <- SparkR:::getJRDD(object) + SparkR:::callJStatic("org.apache.zeppelin.rinterpreter.RStatics", "putZ", name, object) + } + +.z.repr <- function(x) { + if (require(repr)) repr:::repr(x) + else toString(x) + } + +progress_zeppelin <- function(...) { + list(init = function(x) .z.setProgress(0), + step = function() .z.incrementProgress, + term = function() {}) + } + http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/d5e87fb8/r/R/rzeppelin/R/scalaInterpreter.R ---------------------------------------------------------------------- diff --git a/r/R/rzeppelin/R/scalaInterpreter.R b/r/R/rzeppelin/R/scalaInterpreter.R new file mode 100644 index 0000000..c7b236f --- /dev/null +++ b/r/R/rzeppelin/R/scalaInterpreter.R @@ -0,0 +1,123 @@ +rzeppelinPackage <- function(pkgname) { + environmentOfDependingPackage <- parent.env(parent.frame()) + E <- new.env(parent=environmentOfDependingPackage) + E$initialized <- FALSE + E$pkgname <- pkgname + assign("E",E,envir=environmentOfDependingPackage) + invisible() +} + + + +# Private + +checkType <- function(x) { + if ( is.integer(x) ) INTEGER + else if ( is.double(x) ) DOUBLE + else if ( is.logical(x) ) BOOLEAN + else if ( is.character(x) ) STRING + else if ( is.date(x)) DATE + else stop("Unsupported data type.") +} + +checkType2 <- function(x) { + if ( is.integer(x) ) "Int" + else if ( is.double(x) ) "Double" + else if ( is.logical(x) ) "Boolean" + else if ( is.character(x) ) "String" + else if ( is.date(x) ) "Date" + else stop("Unsupported data type.") +} + +convert <- function(x,t) { + if ( t == "Int" ) { + tt <- "atomic" + tm <- "integer" + loav <- FALSE + } else if ( t == "Double" ) { + tt <- "atomic" + tm <- "double" + loav <- FALSE + } else if ( t == "Boolean" ) { + tt <- "atomic" + tm <- "logical" + loav <- FALSE + } else if ( t == "String" ) { + tt <- "atomic" + tm <- "character" + loav <- FALSE + } else if ( t == "Array[Int]" ) { + tt <- "vector" + tm <- "integer" + loav <- TRUE + } else if ( t == "Array[Double]" ) { + tt <- "vector" + tm <- "double" + loav <- TRUE + } else if ( t == "Array[Boolean]" ) { + tt <- "vector" + tm <- "logical" + loav <- TRUE + } else if ( t == "Array[String]" ) { + tt <- "vector" + tm <- "character" + loav <- TRUE + } else if ( t == "Array[Array[Int]]" ) { + tt <- "matrix" + tm <- "integer" + loav <- TRUE + } else if ( t == "Array[Array[Double]]" ) { + tt <- "matrix" + tm <- "double" + loav <- TRUE + } else if ( t == "Array[Array[Boolean]]" ) { + tt <- "matrix" + tm <- "logical" + loav <- TRUE + } else if ( t == "Array[Array[String]]" ) { + tt <- "matrix" + tm <- "character" + loav <- TRUE + } else { + tt <- "reference" + tm <- "reference" + loav <- FALSE + } + v <- character(0) + if ( tt == "atomic" ) v <- c(v,sprintf("%s <- as.vector(%s)[1]",x,x)) + else if ( tt == "vector" ) v <- c(v,sprintf("%s <- as.vector(%s)",x,x)) + else if ( tt == "matrix" ) v <- c(v,sprintf("%s <- as.matrix(%s)",x,x)) + if ( tm != "reference" ) v <- c(v,sprintf("storage.mode(%s) <- '%s'",x,tm)) + if ( length(v) != 0 ) { + v <- c(sprintf("if ( ! inherits(%s,'ScalaInterpreterReference') ) {",x),paste(" ",v,sep=""),"}") + } + c(v,sprintf("intpSet(interpreter,'.',%s,length.one.as.vector=%s,quiet=TRUE)",x,loav)) +} + +cc <- function(c) { + if ( ! get("open",envir=c[['env']]) ) stop("The connection has already been closed.") +} + +wb <- function(c,v) writeBin(v,c[['socketIn']],endian="big") + +wc <- function(c,v) { + bytes <- charToRaw(v) + wb(c,length(bytes)) + writeBin(bytes,c[['socketIn']],endian="big",useBytes=TRUE) +} + +# Sockets should be blocking, but that contract is not fulfilled when other code uses functions from the parallel library. Program around their problem. +rb <- function(c,v,n=1L) { + r <- readBin(c[['socketOut']],what=v,n=n,endian="big") + if ( length(r) == n ) r + else c(r,rb(c,v,n-length(r))) +} + +# Sockets should be blocking, but that contract is not fulfilled when other code uses functions from the parallel library. Program around their problem. +rc <- function(c) { + length <- rb(c,integer(0)) + r <- as.raw(c()) + while ( length(r) != length ) r <- c(r,readBin(c[['socketOut']],what="raw",n=length,endian="big")) + rawToChar(r) +} + http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/d5e87fb8/r/R/rzeppelin/R/zzz.R ---------------------------------------------------------------------- diff --git a/r/R/rzeppelin/R/zzz.R b/r/R/rzeppelin/R/zzz.R new file mode 100644 index 0000000..d901b99 --- /dev/null +++ b/r/R/rzeppelin/R/zzz.R @@ -0,0 +1,9 @@ +typeMap <- list() +typeMap[[INTEGER]] <- integer(0) +typeMap[[DOUBLE]] <- double(0) +typeMap[[BOOLEAN]] <- integer(0) +typeMap[[STRING]] <- character(0) + +.onAttach <- function(libname, pkgname) { + +} http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/d5e87fb8/r/_tools/checkstyle.xml ---------------------------------------------------------------------- diff --git a/r/_tools/checkstyle.xml b/r/_tools/checkstyle.xml new file mode 100644 index 0000000..618d74d --- /dev/null +++ b/r/_tools/checkstyle.xml @@ -0,0 +1,282 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/d5e87fb8/r/_tools/scalastyle.xml ---------------------------------------------------------------------- diff --git a/r/_tools/scalastyle.xml b/r/_tools/scalastyle.xml new file mode 100644 index 0000000..f7bb0d4 --- /dev/null +++ b/r/_tools/scalastyle.xml @@ -0,0 +1,146 @@ + + + + + + + + + + + + Scalastyle standard configuration + + + + + + + + + + + + + + + + + + + true + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +