spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [spark] HeartSaVioR commented on a change in pull request #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
Date Fri, 13 Mar 2020 01:13:38 GMT
HeartSaVioR commented on a change in pull request #27872: [SPARK-31115][SQL] Detect known Janino
bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid
using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#discussion_r391981993
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala
 ##########
 @@ -957,4 +956,60 @@ class DataFrameAggregateSuite extends QueryTest
       assert(error.message.contains("function count_if requires boolean type"))
     }
   }
+
+  /**
+   * NOTE: The test code tries to control the size of for/switch statement in expand_doConsume,
+   * as well as the overall size of expand_doConsume, so that the query triggers known Janino
+   * bug - https://github.com/janino-compiler/janino/issues/113.
+   *
+   * The expected exception message from Janino when we use switch statement for "ExpandExec":
+   * - "Operand stack inconsistent at offset xxx: Previous size 1, now 0"
+   * which will not happen when we use if-else-if statement for "ExpandExec".
+   *
+   * "The number of fields" and "The number of distinct aggregation functions" are the major
+   * factors to increase the size of generated code: while these values should be large enough
+   * to trigger the Janino bug, these values should not also too big; otherwise one of below
+   * exceptions might be thrown:
+   * - "expand_doConsume would be beyond 64KB"
+   * - "java.lang.ClassFormatError: Too many arguments in method signature in class file"
+   */
+  test("SPARK-31115 Lots of columns and distinct aggregations shouldn't break code generation")
{
+    withSQLConf(
+      (SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "true"),
+      (SQLConf.WHOLESTAGE_MAX_NUM_FIELDS.key, "10000"),
+      (SQLConf.CODEGEN_FALLBACK.key, "false"),
+      (SQLConf.CODEGEN_LOGGING_MAX_LINES.key, "-1")
+    ) {
+      var df = Seq(("1", "2", 1), ("1", "2", 2), ("2", "3", 3), ("2", "3", 4)).toDF("a",
"b", "c")
+
+      // The value is tested under commit "e807118eef9e0214170ff62c828524d237bd58e3":
+      // the query fails with switch statement, whereas it passes with if-else statement.
+      // Note that the value depends on the Spark logic as well - different Spark versions
may
+      // require different value to ensure the test failing with switch statement.
+      val numNewFields = 100
+
+      df = df.withColumns(
+        (1 to numNewFields).map { idx => s"a$idx" },
+        (1 to numNewFields).map { idx =>
+          when(col("c").mod(lit(2)).===(lit(0)), lit(idx)).otherwise(col("c"))
+        }
+      )
+
+      val aggExprs: Array[Column] = Range(1, numNewFields).map { idx =>
 
 Review comment:
   I was using `steps` parameter and eventually removed it. Given we seem to be between neutral
to negative on adopting this patch, I'll defer addressing the nit for now.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message