spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Rosen (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-27619) MapType should be prohibited in hash expressions
Date Thu, 02 May 2019 02:37:00 GMT
Josh Rosen created SPARK-27619:
----------------------------------

             Summary: MapType should be prohibited in hash expressions
                 Key: SPARK-27619
                 URL: https://issues.apache.org/jira/browse/SPARK-27619
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.4.0
            Reporter: Josh Rosen


Spark currently allows MapType expressions to be used as input to hash expressions, but
I think that this should be prohibited because Spark SQL does not support map equality. Currently,
Spark SQL's map hashcodes are sensitive to the insertion order of map elements:
{code:java}
val a = spark.createDataset(Map(1->1, 2->2) :: Nil)
val b = spark.createDataset(Map(2->2, 1->1) :: Nil)

# Demonstration of how Scala Map equality is unaffected by insertion order:
assert(Map(1->1, 2->2).hashCode() == Map(2->2, 1->1).hashCode())
assert(Map(1->1, 2->2) == Map(2->2, 1->1))
assert(a.first() == b.first())

# In contrast, this will print two different hashcodes:
println(Seq(a, b).map(_.selectExpr("hash(*)").first())){code}
I think there's precedence for banning the use of MapType here because we already prohibit
MapType in aggregation / joins (SPARK-9415) and set operations (SPARK-19893).

Alternatively, we could support hashing here if we implemented support for comparable map
types (SPARK-18134).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message