spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Rosen (JIRA)" <>
Subject [jira] [Created] (SPARK-27619) MapType should be prohibited in hash expressions
Date Thu, 02 May 2019 02:37:00 GMT
Josh Rosen created SPARK-27619:

             Summary: MapType should be prohibited in hash expressions
                 Key: SPARK-27619
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.4.0
            Reporter: Josh Rosen

Spark currently allows MapType expressions to be used as input to hash expressions, but
I think that this should be prohibited because Spark SQL does not support map equality. Currently,
Spark SQL's map hashcodes are sensitive to the insertion order of map elements:
val a = spark.createDataset(Map(1->1, 2->2) :: Nil)
val b = spark.createDataset(Map(2->2, 1->1) :: Nil)

# Demonstration of how Scala Map equality is unaffected by insertion order:
assert(Map(1->1, 2->2).hashCode() == Map(2->2, 1->1).hashCode())
assert(Map(1->1, 2->2) == Map(2->2, 1->1))
assert(a.first() == b.first())

# In contrast, this will print two different hashcodes:
println(Seq(a, b).map(_.selectExpr("hash(*)").first())){code}
I think there's precedence for banning the use of MapType here because we already prohibit
MapType in aggregation / joins (SPARK-9415) and set operations (SPARK-19893).

Alternatively, we could support hashing here if we implemented support for comparable map
types (SPARK-18134).

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message