spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From xuanyuanking <...@git.apache.org>
Subject [GitHub] spark pull request #22326: [SPARK-25314][SQL] Fix Python UDF accessing attri...
Date Tue, 25 Sep 2018 09:02:21 GMT
Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22326#discussion_r220111919
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
---
    @@ -1234,6 +1237,59 @@ object PushPredicateThroughJoin extends Rule[LogicalPlan] with
PredicateHelper {
       }
     }
     
    +/**
    + * Correctly handle PythonUDF which need access both side of join side by changing the
new join
    + * type to Cross.
    + */
    +object HandlePythonUDFInJoinCondition extends Rule[LogicalPlan] with PredicateHelper
{
    +  override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp {
    +    case j @ Join(_, _, joinType, condition)
    +      if condition.map(splitConjunctivePredicates).getOrElse(Nil).exists(
    +        _.collectFirst { case udf: PythonUDF => udf }.isDefined) =>
    +      if (!joinType.isInstanceOf[InnerLike] && joinType != LeftSemi) {
    +        // The current strategy only support InnerLike and LeftSemi join because other
type
    +        // can not simply be resolved by adding a Cross join. If we pass the plan here,
it'll
    --- End diff --
    
    Yes, will modify the comment more accurate.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message