spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-8908) Calling distinct() with parentheses throws error in Scala DataFrame
Date Wed, 08 Jul 2015 20:28:04 GMT

    [ https://issues.apache.org/jira/browse/SPARK-8908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619298#comment-14619298
] 

Apache Spark commented on SPARK-8908:
-------------------------------------

User 'piaozhexiu' has created a pull request for this issue:
https://github.com/apache/spark/pull/7298

> Calling distinct() with parentheses throws error in Scala DataFrame
> -------------------------------------------------------------------
>
>                 Key: SPARK-8908
>                 URL: https://issues.apache.org/jira/browse/SPARK-8908
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.4.0, 1.5.0
>            Reporter: Cheolsoo Park
>            Priority: Minor
>
> To reproduce, please call {{distinct()}} on DataFrame in spark-shell. For eg,
> {code}
> scala> sqlContext.table("my_table").distinct()
> <console>:19: error: not enough arguments for method apply: (colName: String)org.apache.spark.sql.Column
in class DataFrame.
> Unspecified value parameter colName.
> {code}
> This is confusing because {{distinct}} in DataFrame is an alias of {{dropDuplicates}},
and both {{dropDuplicates}} and {{dropDuplicates()}} work.
> Here is the summary-
> ||Scala code||Works||
> |DF.distinct|Y|
> |DF.distinct()|N|
> |DF.dropDuplicates|Y|
> |DF.dropDuplicates()|Y|
> Looking at the definition of {{distinct}}, it's missing {{()}}-
> {code}
> override def distinct: DataFrame = dropDuplicates()
> {code}
> As a result, what seems happening is as follows-
> {code}
> distinct()
> => dropDuplicates()()
> => DataFrame() // because dropDuplicates() returns DF
> => DataFrame.apply() // fails because apply() takes a column parameter
> {code}
> I can verify that adding {{()}} to the definition makes both {{distinct}} and {{distinct()}}
work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message