spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Takeshi Yamamuro (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-23677) Selecting columns from joined DataFrames with the same origin yields wrong results
Date Thu, 15 Mar 2018 06:25:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-23677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16399969#comment-16399969
] 

Takeshi Yamamuro commented on SPARK-23677:
------------------------------------------

You mean this ticket? SPARK-14948. I think this is a well-known issue.

> Selecting columns from joined DataFrames with the same origin yields wrong results
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-23677
>                 URL: https://issues.apache.org/jira/browse/SPARK-23677
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 2.2.1, 2.3.0
>            Reporter: Martin Mauch
>            Priority: Major
>
> When trying to join two DataFrames with the same origin DataFrame and later selecting
columns from the join, Spark can't distinguish between the columns and gives a wrong (or at
least very surprising) result. One can work around this using expr.
> Here is a minimal example:
>  
> {code:java}
> import spark.implicits._
> val edf = Seq((1), (2), (3), (4), (5)).toDF("num")
> val big = edf.where(edf("num") > 2).alias("big")
> val small = edf.where(edf("num") < 4).alias("small")
> small.join(big, expr("big.num == (small.num + 1)")).select(small("num"), big("num")).show()
> // +---+---+
> // |num|num|
> // +---+---+
> // | 2| 2|
> // | 3| 3|
> // +—+—+
> small.join(big, expr("big.num == (small.num + 1)")).select(expr("small.num"), expr("big.num")).show()
> // +---+---+
> // |num|num|
> // +---+---+
> // | 2| 3|
> // | 3| 4|
> // +---+---+
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message