spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksander Eskilson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-17131) Code generation fails when running SQL expressions against a wide dataset (thousands of columns)
Date Thu, 20 Oct 2016 14:44:58 GMT

    [ https://issues.apache.org/jira/browse/SPARK-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15592000#comment-15592000
] 

Aleksander Eskilson commented on SPARK-17131:
---------------------------------------------

Yeah, that makes sense. So far, what I documented and this one seem to have been the only
JIRAs that exhibit specifically the Constant Pool limit error. I'm trying to dig deeper into
it to see if it really marks its own class of error, but given that SPARK-17702 didn't resolve
the error case I posted (even though it splits up sections of large generated code), I do
suspect they are, quite related, but ultimately different issues. I think the spliExpressions
technique that was used in SPARK-17702 and that also appears to be being employed in SPARK-16845
could be useful for the range of different classes that can generate too many lines of code.
Seeing the issues linked together is definitely useful.

To that end, I'll leave mine resolved as a duplicate of SPARK-16845 for now until I can make
use of the patch it develops, so we can see more conclusively if they're related issues, or
truly duplicates. And I'll link the two "0xFFFF" issues together as related.

> Code generation fails when running SQL expressions against a wide dataset (thousands
of columns)
> ------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-17131
>                 URL: https://issues.apache.org/jira/browse/SPARK-17131
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Iaroslav Zeigerman
>         Attachments: _SPARK_17131__add_a_test_case_with_1000_column_DF_where_describe___fails.patch
>
>
> When reading the CSV file that contains 1776 columns Spark and Janino fail to generate
the code with message:
> {noformat}
> Constant pool has grown past JVM limit of 0xFFFF
> {noformat}
> When running a common select with all columns it's fine:
> {code}
>       val allCols = df.columns.map(c => col(c).as(c + "_alias"))
>       val newDf = df.select(allCols: _*)
>       newDf.show()
> {code}
> But when I invoke the describe method:
> {code}
> newDf.describe(allCols: _*)
> {code}
> it fails with the following stack trace:
> {noformat}
> 	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:889)
> 	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:941)
> 	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:938)
> 	at org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
> 	at org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
> 	... 30 more
> Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool has grown past JVM
limit of 0xFFFF
> 	at org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:402)
> 	at org.codehaus.janino.util.ClassFile.addConstantIntegerInfo(ClassFile.java:300)
> 	at org.codehaus.janino.UnitCompiler.addConstantIntegerInfo(UnitCompiler.java:10307)
> 	at org.codehaus.janino.UnitCompiler.pushConstant(UnitCompiler.java:8868)
> 	at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4346)
> 	at org.codehaus.janino.UnitCompiler.access$7100(UnitCompiler.java:185)
> 	at org.codehaus.janino.UnitCompiler$10.visitIntegerLiteral(UnitCompiler.java:3265)
> 	at org.codehaus.janino.Java$IntegerLiteral.accept(Java.java:4321)
> 	at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290)
> 	at org.codehaus.janino.UnitCompiler.fakeCompile(UnitCompiler.java:2605)
> 	at org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4362)
> 	at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:3975)
> 	at org.codehaus.janino.UnitCompiler.access$6900(UnitCompiler.java:185)
> 	at org.codehaus.janino.UnitCompiler$10.visitMethodInvocation(UnitCompiler.java:3263)
> 	at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974)
> 	at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290)
> 	at org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4368)
> 	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2662)
> 	at org.codehaus.janino.UnitCompiler.access$4400(UnitCompiler.java:185)
> 	at org.codehaus.janino.UnitCompiler$7.visitMethodInvocation(UnitCompiler.java:2627)
> 	at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974)
> 	at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2654)
> 	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:1643)
> ....
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message