spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yin Huai (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-9785) HashPartitioning compatibility should consider expression ordering
Date Tue, 11 Aug 2015 15:53:45 GMT

     [ https://issues.apache.org/jira/browse/SPARK-9785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yin Huai resolved SPARK-9785.
-----------------------------
       Resolution: Fixed
    Fix Version/s: 1.5.0

Issue resolved by pull request 8074
[https://github.com/apache/spark/pull/8074]

> HashPartitioning compatibility should consider expression ordering
> ------------------------------------------------------------------
>
>                 Key: SPARK-9785
>                 URL: https://issues.apache.org/jira/browse/SPARK-9785
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.0
>            Reporter: Josh Rosen
>            Assignee: Josh Rosen
>            Priority: Blocker
>             Fix For: 1.5.0
>
>
> HashPartitioning compatibility is defined w.r.t the _set_ of expressions, but in other
contexts the ordering of those expressions matters.  This is illustrated by the following
regression test:
> {code}
>   test("HashPartitioning compatibility") {
>     val expressions = Seq(Literal(2), Literal(3))
>     // Consider two HashPartitionings that have the same _set_ of hash expressions but
which are
>     // created with different orderings of those expressions:
>     val partitioningA = HashPartitioning(expressions, 100)
>     val partitioningB = HashPartitioning(expressions.reverse, 100)
>     // These partitionings are not considered equal:
>     assert(partitioningA != partitioningB)
>     // However, they both satisfy the same clustered distribution:
>     val distribution = ClusteredDistribution(expressions)
>     assert(partitioningA.satisfies(distribution))
>     assert(partitioningB.satisfies(distribution))
>     // Both partitionings are compatible with and guarantee each other:
>     assert(partitioningA.compatibleWith(partitioningB))
>     assert(partitioningB.compatibleWith(partitioningA))
>     assert(partitioningA.guarantees(partitioningB))
>     assert(partitioningB.guarantees(partitioningA))
>     // Given all of this, we would expect these partitionings to compute the same hashcode
for
>     // any given row:
>     def computeHashCode(partitioning: HashPartitioning): Int = {
>       val hashExprProj = new InterpretedMutableProjection(partitioning.expressions, Seq.empty)
>       hashExprProj.apply(InternalRow.empty).hashCode()
>     }
>     assert(computeHashCode(partitioningA) === computeHashCode(partitioningB))
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message