spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-11976) Support "." character in DataFrame column name
Date Tue, 19 Jul 2016 14:54:20 GMT

    [ https://issues.apache.org/jira/browse/SPARK-11976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384282#comment-15384282
] 

Apache Spark commented on SPARK-11976:
--------------------------------------

User 'rerngvit' has created a pull request for this issue:
https://github.com/apache/spark/pull/14264

> Support "." character in DataFrame column name
> ----------------------------------------------
>
>                 Key: SPARK-11976
>                 URL: https://issues.apache.org/jira/browse/SPARK-11976
>             Project: Spark
>          Issue Type: Improvement
>          Components: SparkR
>    Affects Versions: 1.5.2
>            Reporter: Sun Rui
>
> Now Spark Core support "." character in DataFrame column names. However, when accessing
a column whose name has "." character, the name should be wrapped with backticks.
> for example,
> {code}
> > df<-createDataFrame(sqlContext, list(list(1,2,3)))
> > names(df)<-c("a.b","c","d.e")
> > df$"`a.b`"
> Column a.b 
> > df$"a.b"
> 15/11/25 10:55:06 ERROR RBackendHandler: col on 68 failed
> Error in column(callJMethod(x@sdf, "col", c)) : 
>   error in evaluating the argument 'x' in selecting a method for function 'column': Error
in invokeJava(isStatic = FALSE, objId$id, methodName, ...) : 
>   org.apache.spark.sql.AnalysisException: Cannot resolve column name "a.b" among (a.b,
c, d.e);
> 	at org.apache.spark.sql.DataFrame$$anonfun$resolve$1.apply(DataFrame.scala:151)
> 	at org.apache.spark.sql.DataFrame$$anonfun$resolve$1.apply(DataFrame.scala:151)
> 	at scala.Option.getOrElse(Option.scala:120)
> 	at org.apache.spark.sql.DataFrame.resolve(DataFrame.scala:150)
> 	at org.apache.spark.sql.DataFrame.col(DataFrame.scala:663)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:141)
> 	at org.apache.spark.api.r.RBackendHa
> {code}
> This means, the safe way to select a column using its name is to wrap it with backticks
in the case the column name is programatically fetched, not known in advance.
> When this is support, the below code piece can be removed from createDataFrame():
> {code}
>     # SPAKR-SQL does not support '.' in column name, so replace it with '_'
>     # TODO(davies): remove this once SPARK-2775 is fixed
>     names <- lapply(names, function(n) {
>       nn <- gsub("[.]", "_", n)
>       if (nn != n) {
>         warning(paste("Use", nn, "instead of", n, " as column name"))
>       }
>       nn
>     })
> {code}
> the PR for SPARK-12034 is to suppress warnings when creating DataFrame from iris in test
cases. Remember to clear such warning suppression. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message