spark-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From felixche...@apache.org
Subject spark git commit: [SPARK-23291][SQL][R] R's substr should not reduce starting position by 1 when calling Scala API
Date Wed, 07 Mar 2018 17:37:46 GMT
Repository: spark
Updated Branches:
  refs/heads/master aff7d81cb -> 53561d27c


[SPARK-23291][SQL][R] R's substr should not reduce starting position by 1 when calling Scala
API

## What changes were proposed in this pull request?

Seems R's substr API treats Scala substr API as zero based and so subtracts the given starting
position by 1.

Because Scala's substr API also accepts zero-based starting position (treated as the first
element), so the current R's substr test results are correct as they all use 1 as starting
positions.

## How was this patch tested?

Modified tests.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #20464 from viirya/SPARK-23291.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/53561d27
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/53561d27
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/53561d27

Branch: refs/heads/master
Commit: 53561d27c45db31893bcabd4aca2387fde869b72
Parents: aff7d81
Author: Liang-Chi Hsieh <viirya@gmail.com>
Authored: Wed Mar 7 09:37:42 2018 -0800
Committer: Felix Cheung <felixcheung@apache.org>
Committed: Wed Mar 7 09:37:42 2018 -0800

----------------------------------------------------------------------
 R/pkg/R/column.R                      | 10 ++++++++--
 R/pkg/tests/fulltests/test_sparkSQL.R |  1 +
 docs/sparkr.md                        |  4 ++++
 3 files changed, 13 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/53561d27/R/pkg/R/column.R
----------------------------------------------------------------------
diff --git a/R/pkg/R/column.R b/R/pkg/R/column.R
index 9727efc..7926a9a 100644
--- a/R/pkg/R/column.R
+++ b/R/pkg/R/column.R
@@ -161,12 +161,18 @@ setMethod("alias",
 #' @aliases substr,Column-method
 #'
 #' @param x a Column.
-#' @param start starting position.
+#' @param start starting position. It should be 1-base.
 #' @param stop ending position.
+#' @examples
+#' \dontrun{
+#' df <- createDataFrame(list(list(a="abcdef")))
+#' collect(select(df, substr(df$a, 1, 4))) # the result is `abcd`.
+#' collect(select(df, substr(df$a, 2, 4))) # the result is `bcd`.
+#' }
 #' @note substr since 1.4.0
 setMethod("substr", signature(x = "Column"),
           function(x, start, stop) {
-            jc <- callJMethod(x@jc, "substr", as.integer(start - 1), as.integer(stop -
start + 1))
+            jc <- callJMethod(x@jc, "substr", as.integer(start), as.integer(stop - start
+ 1))
             column(jc)
           })
 

http://git-wip-us.apache.org/repos/asf/spark/blob/53561d27/R/pkg/tests/fulltests/test_sparkSQL.R
----------------------------------------------------------------------
diff --git a/R/pkg/tests/fulltests/test_sparkSQL.R b/R/pkg/tests/fulltests/test_sparkSQL.R
index bd0a0dc..439191a 100644
--- a/R/pkg/tests/fulltests/test_sparkSQL.R
+++ b/R/pkg/tests/fulltests/test_sparkSQL.R
@@ -1651,6 +1651,7 @@ test_that("string operators", {
   expect_false(first(select(df, startsWith(df$name, "m")))[[1]])
   expect_true(first(select(df, endsWith(df$name, "el")))[[1]])
   expect_equal(first(select(df, substr(df$name, 1, 2)))[[1]], "Mi")
+  expect_equal(first(select(df, substr(df$name, 4, 6)))[[1]], "hae")
   if (as.numeric(R.version$major) >= 3 && as.numeric(R.version$minor) >= 3)
{
     expect_true(startsWith("Hello World", "Hello"))
     expect_false(endsWith("Hello World", "a"))

http://git-wip-us.apache.org/repos/asf/spark/blob/53561d27/docs/sparkr.md
----------------------------------------------------------------------
diff --git a/docs/sparkr.md b/docs/sparkr.md
index 6685b58..2909247 100644
--- a/docs/sparkr.md
+++ b/docs/sparkr.md
@@ -663,3 +663,7 @@ You can inspect the search path in R with [`search()`](https://stat.ethz.ch/R-ma
  - The `stringsAsFactors` parameter was previously ignored with `collect`, for example, in
`collect(createDataFrame(iris), stringsAsFactors = TRUE))`. It has been corrected.
  - For `summary`, option for statistics to compute has been added. Its output is changed
from that from `describe`.
  - A warning can be raised if versions of SparkR package and the Spark JVM do not match.
+
+## Upgrading to Spark 2.4.0
+
+ - The `start` parameter of `substr` method was wrongly subtracted by one, previously. In
other words, the index specified by `start` parameter was considered as 0-base. This can lead
to inconsistent substring results and also does not match with the behaviour with `substr`
in R. It has been fixed so the `start` parameter of `substr` method is now 1-base, e.g., therefore
to get the same result as `substr(df$a, 2, 5)`, it should be changed to `substr(df$a, 1, 4)`.


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


Mime
View raw message