spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wenchen Fan (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-17867) Dataset.dropDuplicates (i.e. distinct) should consider the columns with same column name
Date Thu, 13 Oct 2016 05:30:21 GMT

     [ https://issues.apache.org/jira/browse/SPARK-17867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Wenchen Fan updated SPARK-17867:
--------------------------------
    Assignee: Liang-Chi Hsieh

> Dataset.dropDuplicates (i.e. distinct) should consider the columns with same column name
> ----------------------------------------------------------------------------------------
>
>                 Key: SPARK-17867
>                 URL: https://issues.apache.org/jira/browse/SPARK-17867
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Liang-Chi Hsieh
>            Assignee: Liang-Chi Hsieh
>             Fix For: 2.1.0
>
>
> We find and get the first resolved attribute from output with the given column name in
Dataset.dropDuplicates. When we have the more than one columns with the same name. Other columns
are put into aggregation columns, instead of grouping columns. We should fix this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message