spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksander Eskilson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-18016) Code Generation: Constant Pool Past Limit for Wide/Nested Dataset
Date Thu, 19 Jan 2017 16:13:26 GMT

    [ https://issues.apache.org/jira/browse/SPARK-18016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15830163#comment-15830163
] 

Aleksander Eskilson commented on SPARK-18016:
---------------------------------------------

I've put together a potential solution to this issue. It works similarly to the expression
splitting technique used to resolve the various 64 KB errors that popped up when some generated
methods grew too large. In my patch, when the generated code for expressions grows up to a
1600k byte threshold, code generation declares a new nested private class, and any new added
functions are inlined to that class as opposed to the main outer class. New nested classes
are created as necessary if the volume of code generated meets the threshold any additional
times. These nested classes and the code they encompass are then placed at the bottom of the
outer class.

In testing, this solution works well, even if the volume of generated code reaches around
a million lines. Since all mutable state is declared globally to the outer class, functions
inlined to the nested private classes can still access and mutate it as though the functions
were inlined to the outer class, as would be the normal case. 

This solution has the added quality that the code generation for common cases remains unaffected,
as a very large quantity of code must be generated before any class splitting occurs, and
yet it can still solve Constant Pool errors, since any functions and their local state inlined
to the nested classes doesn't count towards the Constant Pool for the main outer class.

Thoughts on this?

I've put together a pull-request for the patch here, [#16648|https://github.com/apache/spark/pull/16648].

> Code Generation: Constant Pool Past Limit for Wide/Nested Dataset
> -----------------------------------------------------------------
>
>                 Key: SPARK-18016
>                 URL: https://issues.apache.org/jira/browse/SPARK-18016
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.0
>            Reporter: Aleksander Eskilson
>
> When attempting to encode collections of large Java objects to Datasets having very wide
or deeply nested schemas, code generation can fail, yielding:
> {code}
> Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool for class org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection
has grown past JVM limit of 0xFFFF
> 	at org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:499)
> 	at org.codehaus.janino.util.ClassFile.addConstantNameAndTypeInfo(ClassFile.java:439)
> 	at org.codehaus.janino.util.ClassFile.addConstantMethodrefInfo(ClassFile.java:358)
> 	at org.codehaus.janino.UnitCompiler.writeConstantMethodrefInfo(UnitCompiler.java:11114)
> 	at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4547)
> 	at org.codehaus.janino.UnitCompiler.access$7500(UnitCompiler.java:206)
> 	at org.codehaus.janino.UnitCompiler$12.visitMethodInvocation(UnitCompiler.java:3774)
> 	at org.codehaus.janino.UnitCompiler$12.visitMethodInvocation(UnitCompiler.java:3762)
> 	at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:4328)
> 	at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3762)
> 	at org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4933)
> 	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:3180)
> 	at org.codehaus.janino.UnitCompiler.access$5000(UnitCompiler.java:206)
> 	at org.codehaus.janino.UnitCompiler$9.visitMethodInvocation(UnitCompiler.java:3151)
> 	at org.codehaus.janino.UnitCompiler$9.visitMethodInvocation(UnitCompiler.java:3139)
> 	at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:4328)
> 	at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:3139)
> 	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2112)
> 	at org.codehaus.janino.UnitCompiler.access$1700(UnitCompiler.java:206)
> 	at org.codehaus.janino.UnitCompiler$6.visitExpressionStatement(UnitCompiler.java:1377)
> 	at org.codehaus.janino.UnitCompiler$6.visitExpressionStatement(UnitCompiler.java:1370)
> 	at org.codehaus.janino.Java$ExpressionStatement.accept(Java.java:2558)
> 	at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:1370)
> 	at org.codehaus.janino.UnitCompiler.compileStatements(UnitCompiler.java:1450)
> 	at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2811)
> 	at org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1262)
> 	at org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1234)
> 	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:538)
> 	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:890)
> 	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:894)
> 	at org.codehaus.janino.UnitCompiler.access$600(UnitCompiler.java:206)
> 	at org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:377)
> 	at org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:369)
> 	at org.codehaus.janino.Java$MemberClassDeclaration.accept(Java.java:1128)
> 	at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:369)
> 	at org.codehaus.janino.UnitCompiler.compileDeclaredMemberTypes(UnitCompiler.java:1209)
> 	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:564)
> 	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:420)
> 	at org.codehaus.janino.UnitCompiler.access$400(UnitCompiler.java:206)
> 	at org.codehaus.janino.UnitCompiler$2.visitPackageMemberClassDeclaration(UnitCompiler.java:374)
> 	at org.codehaus.janino.UnitCompiler$2.visitPackageMemberClassDeclaration(UnitCompiler.java:369)
> 	at org.codehaus.janino.Java$AbstractPackageMemberClassDeclaration.accept(Java.java:1309)
> 	at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:369)
> 	at org.codehaus.janino.UnitCompiler.compileUnit(UnitCompiler.java:345)
> 	at org.codehaus.janino.SimpleCompiler.compileToClassLoader(SimpleCompiler.java:396)
> 	at org.codehaus.janino.ClassBodyEvaluator.compileToClass(ClassBodyEvaluator.java:311)
> 	at org.codehaus.janino.ClassBodyEvaluator.cook(ClassBodyEvaluator.java:229)
> 	at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:196)
> 	at org.codehaus.commons.compiler.Cookable.cook(Cookable.java:91)
> 	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:905)
> 	... 35 more
> {code}
> During generation of the code for SpecificUnsafeProjection, all the mutable variables
are declared up front. If there are too many, it seems it perhaps exceeds some type of resource
limit.
> This issue seems related to (but is not fixed by) SPARK-17702, which itself was about
the size of individual methods growing beyond the 64 KB limit. SPARK-17702 was resolved by
breaking extractions into smaller methods, but does not seem to have resolved this issue.
> I've created a small project [1] where I declare a list of "wide" and "nested" Bean objects
that I attempt to encode to a Dataset. This code can trigger the failure for Spark 2.1.0-SNAPSHOT.

> [1] - https://github.com/bdrillard/spark-codegen-error



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message