spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.
Date Tue, 11 Dec 2018 17:34:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16717622#comment-16717622
] 

ASF GitHub Bot commented on SPARK-26224:
----------------------------------------

AmplabJenkins commented on issue #23285: [SPARK-26224][SQL] Avoid creating many project on
subsequent calls to withColumn
URL: https://github.com/apache/spark/pull/23285#issuecomment-446290728
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Results in stackOverFlowError when trying to add 3000 new columns using withColumn function
of dataframe.
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-26224
>                 URL: https://issues.apache.org/jira/browse/SPARK-26224
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0
>         Environment: On macbook, used Intellij editor. Ran the above sample code as unit
test.
>            Reporter: Dorjee Tsering
>            Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a base dataframe
with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new StructField("field_"
+ i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => df.withColumn(newColumn.name,
lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message