spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [spark] viirya commented on a change in pull request #24981: [SPARK-27463][PYTHON] Support Dataframe Cogroup via Pandas UDFs
Date Thu, 12 Sep 2019 17:59:58 GMT
viirya commented on a change in pull request #24981: [SPARK-27463][PYTHON] Support Dataframe
Cogroup via Pandas UDFs
URL: https://github.com/apache/spark/pull/24981#discussion_r323494985
 
 

 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala
 ##########
 @@ -523,6 +523,47 @@ class RelationalGroupedDataset protected[sql](
     Dataset.ofRows(df.sparkSession, plan)
   }
 
+  /**
+   * Applies a vectorized python user-defined function to each cogrouped data.
+   * The user-defined function defines a transformation:
+   * `pandas.DataFrame`, `pandas.DataFrame` -> `pandas.DataFrame`.
+   *  For each group in the cogrouped data, all elements in the group are passed as a
+   * `pandas.DataFrame` and the results for all cogroups are combined into a new [[DataFrame]].
+   *
+   * This function uses Apache Arrow as serialization format between Java executors and Python
+   * workers.
+   */
+  private[sql] def flatMapCoGroupsInPandas
+  (r: RelationalGroupedDataset, expr: PythonUDF): DataFrame = {
 
 Review comment:
   indent looks wrong?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message