spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Felix Cheung (JIRA)" <>
Subject [jira] [Commented] (SPARK-6817) DataFrame UDFs in R
Date Thu, 21 Jan 2016 09:04:39 GMT


Felix Cheung commented on SPARK-6817:

Thanks for putting together on the doc [~sunrui]
In this design, how does one control the partitioning? For instance, suppose one would like
to group census data DataFrame by a certain column, say MetropolitanArea, and then pass to
R's kmeans to cluster residents within close-by geographical areas. In order for the R UDFs
to be effective, in this and some other cases, one would need to make sure the data is partition
appropriately, and that mapPartition would produce a local R data.frame (assuming it fits
into memory) that has all the relevant data in it?

> DataFrame UDFs in R
> -------------------
>                 Key: SPARK-6817
>                 URL:
>             Project: Spark
>          Issue Type: New Feature
>          Components: SparkR, SQL
>            Reporter: Shivaram Venkataraman
> This depends on some internal interface of Spark SQL, should be done after merging into

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message