spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <>
Subject [GitHub] [spark] HyukjinKwon opened a new pull request #29098: [SPARK-32300][PYTHON][2.4] toPandas should work from a Spark DataFrame with no partitions
Date Tue, 14 Jul 2020 08:39:09 GMT

HyukjinKwon opened a new pull request #29098:

   ### What changes were proposed in this pull request?
   This PR proposes to just simply by-pass the case when the number of array size is negative,
when it collects data from Spark DataFrame with no partitions for `toPandas`.
   spark.sparkContext.emptyRDD().toDF("col1 int").toPandas()
   In the master and branch-3.0, this was fixed together at
but it's legitimately not ported back.
   ### Why are the changes needed?
   To make empty Spark DataFrame able to be a pandas DataFrame.
   ### Does this PR introduce _any_ user-facing change?
   spark.sparkContext.emptyRDD().toDF("col1 int").toPandas()
   Caused by: java.lang.NegativeArraySizeException
   	at org.apache.spark.sql.Dataset$$anonfun$collectAsArrowToPython$1$$anonfun$apply$17.apply(Dataset.scala:3293)
   	at org.apache.spark.sql.Dataset$$anonfun$collectAsArrowToPython$1$$anonfun$apply$17.apply(Dataset.scala:3287)
   	at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3370)
   	at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
   Empty DataFrame
   Columns: [col1]
   Index: []
   ### How was this patch tested?
   Manually tested and unittest were added.

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message