spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kiszk <...@git.apache.org>
Subject [GitHub] spark issue #19865: [SPARK-22668][SQL] Exclude global variables from argumen...
Date Sun, 03 Dec 2017 14:39:11 GMT
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19865
  
    @mgaido91 @viirya  As you see, we see an assertion failure. Here is an evidence that we
pass a global variable to arguments of split function.
    In practice, we did not guarantee that we do not pass a global variable.
    
    
    An [value](github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala)
was declared as a global variable. Then, it is passed as `ExprCode.value`. Finally, `value`
is passed as an argument in `CodeGenContext.splitFunction`. Fortunally, this `expressions`
did not update the global variable. Thus, it worked fuctionally correct. 
    In general, it is hard to ensure there is no update in the `expressions`. Of course, we
do not like to use regular expressions to detect it.
    
    As you said, how do we ensure that we do not pass a global variable?
    
    
    
    ```
    **********************************************************************
    File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/ml/feature.py", line
1205, in __main__.MinHashLSH
    Failed example:
    ...
        Caused by: java.lang.AssertionError: assertion failed: smj_value16 in arguments should
not be declared as a global variable
        	at scala.Predef$.assert(Predef.scala:170)
        	at org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext.org$apache$spark$sql$catalyst$expressions$codegen$CodegenContext$$isDeclaredMutableState(CodeGenerator.scala:226)
        	at org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext$$anonfun$9.apply(CodeGenerator.scala:854)
        	at org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext$$anonfun$9.apply(CodeGenerator.scala:854)
        	at scala.collection.TraversableLike$$anonfun$filterImpl$1.apply(TraversableLike.scala:248)
        	at scala.collection.immutable.List.foreach(List.scala:381)
        	at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247)
        	at scala.collection.TraversableLike$class.filter(TraversableLike.scala:259)
        	at scala.collection.AbstractTraversable.filter(Traversable.scala:104)
        	at org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext.splitExpressions(CodeGenerator.scala:853)
        	at org.apache.spark.sql.catalyst.expressions.HashExpression.genHashForStruct(hash.scala:395)
        	at org.apache.spark.sql.catalyst.expressions.HashExpression.computeHashWithTailRec(hash.scala:421)
        	at org.apache.spark.sql.catalyst.expressions.HashExpression.computeHash(hash.scala:429)
        	at org.apache.spark.sql.catalyst.expressions.HashExpression$$anonfun$1.apply(hash.scala:276)
        	at org.apache.spark.sql.catalyst.expressions.HashExpression$$anonfun$1.apply(hash.scala:273)
        	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
        	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
        	at scala.collection.AbstractTraversable.map(Traversable.scala:104)
        	at org.apache.spark.sql.catalyst.expressions.HashExpression.doGenCode(hash.scala:273)
        	at org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:107)
        	at org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104)
        	at scala.Option.getOrElse(Option.scala:121)
        	at org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:104)
        	at org.apache.spark.sql.execution.aggregate.HashAggregateExec.doConsumeWithKeys(HashAggregateExec.scala:772)
        	at org.apache.spark.sql.execution.aggregate.HashAggregateExec.doConsume(HashAggregateExec.scala:173)
        	at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:162)
        	at org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:35)
        	at org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:65)
        	at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:162)
        	at org.apache.spark.sql.execution.joins.SortMergeJoinExec.consume(SortMergeJoinExec.scala:36)
        	at org.apache.spark.sql.execution.joins.SortMergeJoinExec.doProduce(SortMergeJoinExec.scala:626)
        	at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:85)
        	at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:80)
        	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:141)
        	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:138)
        	at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:80)
        	at org.apache.spark.sql.execution.joins.SortMergeJoinExec.produce(SortMergeJoinExec.scala:36)
        	at org.apache.spark.sql.execution.ProjectExec.doProduce(basicPhysicalOperators.scala:45)
        	at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:85)
        	at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:80)
        	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:141)
        	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:138)
        	at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:80)
        	at org.apache.spark.sql.execution.ProjectExec.produce(basicPhysicalOperators.scala:35)
        	at org.apache.spark.sql.execution.aggregate.HashAggregateExec.doProduceWithKeys(HashAggregateExec.scala:647)
        	at org.apache.spark.sql.execution.aggregate.HashAggregateExec.doProduce(HashAggregateExec.scala:165)
        	at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:85)
        	at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:80)
        	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:141)
        	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:138)
        	at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:80)
        	at org.apache.spark.sql.execution.aggregate.HashAggregateExec.produce(HashAggregateExec.scala:39)
        	at org.apache.spark.sql.execution.WholeStageCodegenExec.doCodeGen(WholeStageCodegenExec.scala:374)
        	at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:422)
        	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
        	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:113)
        	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:141)
        	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:138)
        	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
        	at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.prepareShuffleDependency(ShuffleExchangeExec.scala:89)
        	at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:125)
        	at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:116)
        	at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52)
    ...```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message