spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liang-Chi Hsieh (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-23614) Union produces incorrect results when caching is used
Date Thu, 15 Mar 2018 06:44:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-23614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Liang-Chi Hsieh updated SPARK-23614:
------------------------------------
    Component/s:     (was: Spark Core)
                 SQL

> Union produces incorrect results when caching is used
> -----------------------------------------------------
>
>                 Key: SPARK-23614
>                 URL: https://issues.apache.org/jira/browse/SPARK-23614
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Morten Hornbech
>            Priority: Major
>
> We just upgraded from 2.2 to 2.3 and our test suite caught this error:
> {code:java}
> case class TestData(x: Int, y: Int, z: Int)
> val frame = session.createDataset(Seq(TestData(1, 2, 3), TestData(4, 5, 6))).cache()
> val group1 = frame.groupBy("x").agg(min(col("y")) as "value")
> val group2 = frame.groupBy("x").agg(min(col("z")) as "value")
> group1.union(group2).show()
> // +---+-----+
> // | x|value|
> // +---+-----+
> // | 1| 2|
> // | 4| 5|
> // | 1| 2|
> // | 4| 5|
> // +---+-----+
> group2.union(group1).show()
> // +---+-----+
> // | x|value|
> // +---+-----+
> // | 1| 3|
> // | 4| 6|
> // | 1| 3|
> // | 4| 6|
> // +---+-----+
> {code}
> The error disappears if the first data frame is not cached or if the two group by's use
separate copies. I'm not sure exactly what happens on the insides of Spark, but errors that
produce incorrect results rather than exceptions always concerns me.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message