Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/20670
Good catch! This is a real problem, but the fix looks hacky.
By definition, I think `plan.contraints` should only include constraints that refer to
`plan.output`, as that's the promise a plan can make to its parent. However, join is special
as `Join.condition` can refer to both of the join sides, and we add the constraints to `Join.condition`,
which is kind of we are making a promise to Join's children, not parent. My proposal:
```
lazy val constraints: ExpressionSet = {
if (conf.constraintPropagationEnabled) {
allConstraints.filter { c =>
c.references.nonEmpty && c.references.subsetOf(outputSet) && c.deterministic
}
} else {
ExpressionSet(Set.empty)
}
}
lazy val allConstraints = ExpressionSet(validConstraints
.union(inferAdditionalConstraints(validConstraints))
.union(constructIsNotNullConstraints(validConstraints)))
```
Then we can call `plan.allConstraints` when inferring contraints for join.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
|