spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiao Li (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-17217) Codegeneration fails for describe() on many columns
Date Wed, 07 Feb 2018 01:43:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-17217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354848#comment-16354848
] 

Xiao Li commented on SPARK-17217:
---------------------------------

It should be resolved by https://issues.apache.org/jira/browse/SPARK-22510. If not, please
re-open it.

> Codegeneration fails for describe() on many columns
> ---------------------------------------------------
>
>                 Key: SPARK-17217
>                 URL: https://issues.apache.org/jira/browse/SPARK-17217
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.0.0
>            Reporter: Kalle Jepsen
>            Priority: Major
>
> Consider the following minimal python script:
> {code:python}
> import pyspark
> from pyspark.sql import functions as F
> conf = pyspark.SparkConf()
> sc = pyspark.SparkContext(conf=conf)
> spark = pyspark.sql.SQLContext(sc)
> ncols = 510
> nrows = 10
> df = spark.range(0, nrows)
> s = df.select(
>     [
>         F.randn(seed=i).alias('C%i' % i) for i in range(ncols)
>     ]
> ).describe()
> {code}
> This fails with a traceback counting 3.6M (!) lines for {{ncols >= 510}}, saying something
like
> {noformat}
> 16/08/24 16:50:57 ERROR CodeGenerator: failed to compile: java.io.EOFException
> /* 001 */ public java.lang.Object generate(Object[] references) {
> /* 002 */   return new SpecificMutableProjection(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificMutableProjection extends org.apache.spark.sql.catalyst.expressions.codegen.BaseMutableProjection
{
> ...
> /* 7372 */   private boolean isNull_1969;
> /* 7373 */   private double value_1969;
> /* 7374 */   private boolean isNull_1970;
> ...
> /* 11035 */       double value14944 = -1.0;
> /* 11036 */
> /* 11037 */
> /* 11038 */       if (!evalExpr1052IsNull) {
> /* 11039 */
> /* 11040 */         isNull14944 = false; // resultCode could change nullability.
> /* 11041 */         value14944 = evalExpr1326Value - evalExpr1052Value;
> /* 11042 */
> ...
> /* 157621 */     apply1_6(i);
> /* 157622 */     return mutableRow;
> /* 157623 */   }
> /* 157624 */ }
> 	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:889)
> 	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:941)
> 	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:938)
> 	at org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
> 	at org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
> 	at org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
> 	... 30 more
> Caused by: java.io.EOFException
> 	at java.io.DataInputStream.readFully(DataInputStream.java:197)
> 	at java.io.DataInputStream.readFully(DataInputStream.java:169)
> 	at org.codehaus.janino.util.ClassFile.loadAttribute(ClassFile.java:1383)
> 	at org.codehaus.janino.util.ClassFile.loadAttributes(ClassFile.java:555)
> 	at org.codehaus.janino.util.ClassFile.loadFields(ClassFile.java:518)
> 	at org.codehaus.janino.util.ClassFile.<init>(ClassFile.java:185)
> 	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:914)
> 	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:912)
> 	at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> 	at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
> 	at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> 	at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
> 	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.recordCompilationStats(CodeGenerator.scala:912)
> 	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:884)
> 	... 35 more
> {noformat}
> I've seen something similar in an earlier Spark version ([reported in this issue|https://issues.apache.org/jira/browse/SPARK-14138]).
> My conclusion is that {{describe}} was never meant to be used non-interactively on very
wide dataframes, am I right?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message