spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xusen Yin (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-13119) SparkR Ser/De fail to handle "columns(df)"
Date Mon, 01 Feb 2016 20:29:39 GMT

     [ https://issues.apache.org/jira/browse/SPARK-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Xusen Yin updated SPARK-13119:
------------------------------
    Description: 
When I want to extract names of columns of a DataFrame for https://issues.apache.org/jira/browse/SPARK-13011,
the Ser/De of SparkR fail to handle column names of a DataFrame, as illustrated in the test
code below:

{code:title=test_Serde.R|theme=FadeToGrey|linenumbers=true|language=R|firstline=0001|collapse=false}
test_that("SerDe of primitive types", {
  sqlContext <- sparkRSQL.init(sc)
  df <- suppressWarnings(createDataFrame(sqlContext, iris))
  names <- columns(df)
  x <- callJStatic("SparkRHandler", "echo", names)
  expect_equal(x, names)
  expect_equal(class(x), class(names))
})    
{code}

We can get the following error:

{code:title=stack-trace|theme=FadeToGrey|linenumbers=true|language=R|firstline=0001|collapse=false}
1. Error: SerDe of primitive types ---------------------------------------------
(converted from warning) the condition has length > 1 and only the first element will be
used
1: withCallingHandlers(eval(code, new_test_environment), error = capture_calls, message =
function(c) invokeRestart("muffleMessage"))
2: eval(code, new_test_environment)
3: eval(expr, envir, enclos)
4: callJStatic("SparkRHandler", "echo", names) at test_Serde.R:42
5: invokeJava(isStatic = TRUE, className, methodName, ...)
6: writeArgs(rc, args)
7: writeObject(con, a)
8: .signalSimpleWarning("the condition has length > 1 and only the first element will be
used",
       quote(if (is.na(object)) {
           object <- NULL
           type <- "NULL"
       }))
9: withRestarts({
       .Internal(.signalCondition(simpleWarning(msg, call), msg, call))
       .Internal(.dfltWarn(msg, call))
   }, muffleWarning = function() NULL)
10: withOneRestart(expr, restarts[[1L]])
11: doWithOneRestart(return(expr), restart) 
{code}

It occurs because the result of "class(columns(df))" is "character". Ser/De uses the result
to check the type of object and select different ser/de methods. However, "columns(df)" is
not the common "character" type so the ser/de fails.

  was:
Ser/De of SparkR fail to handle column names of a DataFrame, as illustrated in the test code
below:

{code:title=test_Serde.R|theme=FadeToGrey|linenumbers=true|language=R|firstline=0001|collapse=false}
test_that("SerDe of primitive types", {
  sqlContext <- sparkRSQL.init(sc)
  df <- suppressWarnings(createDataFrame(sqlContext, iris))
  names <- columns(df)
  x <- callJStatic("SparkRHandler", "echo", names)
  expect_equal(x, names)
  expect_equal(class(x), class(names))
})    
{code}

We can get the following error:

{code:title=stack-trace|theme=FadeToGrey|linenumbers=true|language=R|firstline=0001|collapse=false}
1. Error: SerDe of primitive types ---------------------------------------------
(converted from warning) the condition has length > 1 and only the first element will be
used
1: withCallingHandlers(eval(code, new_test_environment), error = capture_calls, message =
function(c) invokeRestart("muffleMessage"))
2: eval(code, new_test_environment)
3: eval(expr, envir, enclos)
4: callJStatic("SparkRHandler", "echo", names) at test_Serde.R:42
5: invokeJava(isStatic = TRUE, className, methodName, ...)
6: writeArgs(rc, args)
7: writeObject(con, a)
8: .signalSimpleWarning("the condition has length > 1 and only the first element will be
used",
       quote(if (is.na(object)) {
           object <- NULL
           type <- "NULL"
       }))
9: withRestarts({
       .Internal(.signalCondition(simpleWarning(msg, call), msg, call))
       .Internal(.dfltWarn(msg, call))
   }, muffleWarning = function() NULL)
10: withOneRestart(expr, restarts[[1L]])
11: doWithOneRestart(return(expr), restart) 
{code}

It occurs because the "class(columns(df))" is character. Ser/De uses the result to check the
type of object and select different ser/de methods. However, "columns(df)" is not the common
"character" type so the ser/de fails.


> SparkR Ser/De fail to handle "columns(df)"
> ------------------------------------------
>
>                 Key: SPARK-13119
>                 URL: https://issues.apache.org/jira/browse/SPARK-13119
>             Project: Spark
>          Issue Type: Bug
>          Components: SparkR
>            Reporter: Xusen Yin
>
> When I want to extract names of columns of a DataFrame for https://issues.apache.org/jira/browse/SPARK-13011,
the Ser/De of SparkR fail to handle column names of a DataFrame, as illustrated in the test
code below:
> {code:title=test_Serde.R|theme=FadeToGrey|linenumbers=true|language=R|firstline=0001|collapse=false}
> test_that("SerDe of primitive types", {
>   sqlContext <- sparkRSQL.init(sc)
>   df <- suppressWarnings(createDataFrame(sqlContext, iris))
>   names <- columns(df)
>   x <- callJStatic("SparkRHandler", "echo", names)
>   expect_equal(x, names)
>   expect_equal(class(x), class(names))
> })    
> {code}
> We can get the following error:
> {code:title=stack-trace|theme=FadeToGrey|linenumbers=true|language=R|firstline=0001|collapse=false}
> 1. Error: SerDe of primitive types ---------------------------------------------
> (converted from warning) the condition has length > 1 and only the first element will
be used
> 1: withCallingHandlers(eval(code, new_test_environment), error = capture_calls, message
= function(c) invokeRestart("muffleMessage"))
> 2: eval(code, new_test_environment)
> 3: eval(expr, envir, enclos)
> 4: callJStatic("SparkRHandler", "echo", names) at test_Serde.R:42
> 5: invokeJava(isStatic = TRUE, className, methodName, ...)
> 6: writeArgs(rc, args)
> 7: writeObject(con, a)
> 8: .signalSimpleWarning("the condition has length > 1 and only the first element will
be used",
>        quote(if (is.na(object)) {
>            object <- NULL
>            type <- "NULL"
>        }))
> 9: withRestarts({
>        .Internal(.signalCondition(simpleWarning(msg, call), msg, call))
>        .Internal(.dfltWarn(msg, call))
>    }, muffleWarning = function() NULL)
> 10: withOneRestart(expr, restarts[[1L]])
> 11: doWithOneRestart(return(expr), restart) 
> {code}
> It occurs because the result of "class(columns(df))" is "character". Ser/De uses the
result to check the type of object and select different ser/de methods. However, "columns(df)"
is not the common "character" type so the ser/de fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message