spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikołaj Hnatiuk (JIRA) <j...@apache.org>
Subject [jira] [Commented] (SPARK-15294) Add pivot functionality to SparkR
Date Mon, 23 May 2016 16:11:12 GMT

    [ https://issues.apache.org/jira/browse/SPARK-15294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15296573#comment-15296573
] 

Mikołaj Hnatiuk commented on SPARK-15294:
-----------------------------------------

Hi, ok, so I have this function defined:

{code}
# IN "generics.R"
# @rdname pivot
# @export
setGeneric("pivot", function(x, colname, values=NULL) {  standardGeneric("pivot") })
# IN "group.R":

setMethod("pivot",
          signature(x = "GroupedData"),
          function(x, colname, values=NULL){
          if(is.null(values)){
            result <- SparkR:::callJMethod(x@sgd, "pivot", colname)
          }else{
            stopifnot(length(values)==length(unique(values)))
            result <- SparkR:::callJMethod(x@sgd, "pivot", colname, values)
          }
            SparkR:::groupedData(result)
          })



{code}

And now, Im trying to do this

{code}
df = createDataFrame(sqlContext, data.frame(
  earnings = c(10000, 10000, 11000, 15000, 12000, 20000, 21000, 22000),
  course = c("R", "Python", "R", "Python", "R", "Python", "R", "Python"),
  year = c(2013, 2013, 2014, 2014, 2015, 2015, 2016, 2016)
))
sums <- groupBy(df, "year") %>% 
  pivot("course", values) %>% 
  SparkR::summarize(sumOfEarnings = sum(df$earnings) ) %>% 
  collect()
{code}

It apparently works, but look at the last transformation (summarize) -> I have to use sum(df$earnings)
instead of just giving a column name *I shouldn't be summing variable "earnings" from DataFrame*.
Instead _I should_ be able to sum "earnings" from GroupedData object that function "pivot"
returns, right? 

Anyway, please give it a try :) I know this is the smallest commit every, but I'd be delighted
if you would open PR for this.



> Add pivot functionality to SparkR
> ---------------------------------
>
>                 Key: SPARK-15294
>                 URL: https://issues.apache.org/jira/browse/SPARK-15294
>             Project: Spark
>          Issue Type: Improvement
>          Components: SparkR
>            Reporter: Mikołaj Hnatiuk
>            Priority: Minor
>              Labels: pivot
>
> R users are very used to transforming data using functions such as dcast (pkg:reshape2).
https://github.com/apache/spark/pull/7841 introduces such functionality to Scala and Python
APIs. I'd like to suggest adding this functionality into SparkR API to pivot DataFrames.
> I'd love to to this, however, my knowledge of Scala is still limited, but with a proper
guidance I can give it a try.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message