spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Priyanka Garg (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-15797) To expose groupingSets for DataFrame
Date Tue, 07 Jun 2016 04:33:20 GMT
Priyanka Garg created SPARK-15797:
-------------------------------------

             Summary: To expose groupingSets for DataFrame
                 Key: SPARK-15797
                 URL: https://issues.apache.org/jira/browse/SPARK-15797
             Project: Spark
          Issue Type: New Feature
          Components: SQL
    Affects Versions: 1.5.1
            Reporter: Priyanka Garg


Currently, Cube and rollup functions are exposed in data frame but not grouping sets. 
For eg.
df.rollup($"department", $"group", $designation).avg() results into 
a. All combinations of department , group and designations
b. All combinations of department , group , taking designation as null
c. All departments , taking groups and designation as null
d. taking department and group both null ( means aggregating on the complete data)

On the same lines , there should be a function grouping sets , in which custom groupings can
be specified.
For eg.
df.groupingSets(($"department", $"group", $"designation"), ($"group") ,($"designation"), ()
).avg() 
This should result into:
1. All combinations of department, group and designation
2. All values of group taking department and designation as null
3. All  values of designation, taking department and group as null.
4. Aggregation on complete data i.e. taking designation, group and department as null.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message