spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hyukjin Kwon (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-21413) Multiple projections with CASE WHEN fails to run generated codes
Date Fri, 14 Jul 2017 04:35:02 GMT

     [ https://issues.apache.org/jira/browse/SPARK-21413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hyukjin Kwon updated SPARK-21413:
---------------------------------
    Description: 
Scala codes to reproduce are as below:

{code}
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
import org.apache.spark.sql.Row

val schema = StructType(StructField("fieldA", IntegerType) :: Nil)
var df = spark.createDataFrame(spark.sparkContext.parallelize(Seq(Row(1))), schema)
df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
df.show()
{code}


Calling {{explain()}} on the dataframe in the former case shows a huge case-when projection
and {{show()}} fails with the exception as below:

{code}
...
Caused by: org.codehaus.janino.JaninoRuntimeException: Code of method "apply_0$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V"
of class "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection"
grows beyond 64 KB
  at org.codehaus.janino.CodeContext.makeSpace(CodeContext.java:949)
  at org.codehaus.janino.CodeContext.write(CodeContext.java:839)
  at org.codehaus.janino.UnitCompiler.writeOpcode(UnitCompiler.java:11081)
  at org.codehaus.janino.UnitCompiler.pushConstant(UnitCompiler.java:9674)
  at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4911)
  at org.codehaus.janino.UnitCompiler.access$7700(UnitCompiler.java:206)
  at org.codehaus.janino.UnitCompiler$12.visitIntegerLiteral(UnitCompiler.java:3776)
...
{code}


Note that, I could not reproduce this with local relation (this one appears by {{ConvertToLocalRelation}}).

{code}
import org.apache.spark.sql.functions._

var df = Seq(1).toDF("fieldA")
df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
df.show()
{code}


  was:
Scala codes to reproduce are as below:

{code}
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
import org.apache.spark.sql.Row

val schema = StructType(StructField("fieldA", IntegerType) :: Nil)
var df = spark.createDataFrame(spark.sparkContext.parallelize(Seq(Row(1))), schema)
df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
df.show()
{code}


Calling {{explain()}} on the dataframe in the former case shows a huge case-when projection
and {{show()}} fails with the exception as below:

{code}
...
Caused by: org.codehaus.janino.JaninoRuntimeException: Code of method "apply_0$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V"
of class "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection"
grows beyond 64 KB
  at org.codehaus.janino.CodeContext.makeSpace(CodeContext.java:949)
  at org.codehaus.janino.CodeContext.write(CodeContext.java:839)
  at org.codehaus.janino.UnitCompiler.writeOpcode(UnitCompiler.java:11081)
  at org.codehaus.janino.UnitCompiler.pushConstant(UnitCompiler.java:9674)
  at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4911)
  at org.codehaus.janino.UnitCompiler.access$7700(UnitCompiler.java:206)
  at org.codehaus.janino.UnitCompiler$12.visitIntegerLiteral(UnitCompiler.java:3776)
...
{code}


Note that, I could not reproduce this with local relation.

{code}
import org.apache.spark.sql.functions._

var df = Seq(1).toDF("fieldA")
df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
df.show()
{code}



> Multiple projections with CASE WHEN fails to run generated codes
> ----------------------------------------------------------------
>
>                 Key: SPARK-21413
>                 URL: https://issues.apache.org/jira/browse/SPARK-21413
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Hyukjin Kwon
>
> Scala codes to reproduce are as below:
> {code}
> import org.apache.spark.sql.functions._
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.Row
> val schema = StructType(StructField("fieldA", IntegerType) :: Nil)
> var df = spark.createDataFrame(spark.sparkContext.parallelize(Seq(Row(1))), schema)
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df.show()
> {code}
> Calling {{explain()}} on the dataframe in the former case shows a huge case-when projection
and {{show()}} fails with the exception as below:
> {code}
> ...
> Caused by: org.codehaus.janino.JaninoRuntimeException: Code of method "apply_0$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V"
of class "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection"
grows beyond 64 KB
>   at org.codehaus.janino.CodeContext.makeSpace(CodeContext.java:949)
>   at org.codehaus.janino.CodeContext.write(CodeContext.java:839)
>   at org.codehaus.janino.UnitCompiler.writeOpcode(UnitCompiler.java:11081)
>   at org.codehaus.janino.UnitCompiler.pushConstant(UnitCompiler.java:9674)
>   at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4911)
>   at org.codehaus.janino.UnitCompiler.access$7700(UnitCompiler.java:206)
>   at org.codehaus.janino.UnitCompiler$12.visitIntegerLiteral(UnitCompiler.java:3776)
> ...
> {code}
> Note that, I could not reproduce this with local relation (this one appears by {{ConvertToLocalRelation}}).
> {code}
> import org.apache.spark.sql.functions._
> var df = Seq(1).toDF("fieldA")
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df.show()
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message