spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Khoa Tran (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-23705) dataframe.groupBy() may inadvertently receive sequence of non-distinct strings
Date Fri, 16 Mar 2018 03:51:00 GMT
Khoa Tran created SPARK-23705:
---------------------------------

             Summary: dataframe.groupBy() may inadvertently receive sequence of non-distinct
strings
                 Key: SPARK-23705
                 URL: https://issues.apache.org/jira/browse/SPARK-23705
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.3.0
            Reporter: Khoa Tran


{code:java}
// code placeholder
package org.apache.spark.sql
.
.
.
class Dataset[T] private[sql](
.
.
.
def groupBy(col1: String, cols: String*): RelationalGroupedDataset = {
  val colNames: Seq[String] = col1 +: cols
  RelationalGroupedDataset(
    toDF(), colNames.map(colName => resolve(colName)), RelationalGroupedDataset.GroupByType)
}
{code}
should append a `.distinct` after `colNames` when used in `groupBy` 

 

Not sure if the community agrees with this or it's up to the users to perform the distinct operation



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message