spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Felix Cheung (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-6817) DataFrame UDFs in R
Date Thu, 21 Jan 2016 09:04:39 GMT

    [ https://issues.apache.org/jira/browse/SPARK-6817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110299#comment-15110299
] 

Felix Cheung commented on SPARK-6817:
-------------------------------------

Thanks for putting together on the doc [~sunrui]
In this design, how does one control the partitioning? For instance, suppose one would like
to group census data DataFrame by a certain column, say MetropolitanArea, and then pass to
R's kmeans to cluster residents within close-by geographical areas. In order for the R UDFs
to be effective, in this and some other cases, one would need to make sure the data is partition
appropriately, and that mapPartition would produce a local R data.frame (assuming it fits
into memory) that has all the relevant data in it?
 

> DataFrame UDFs in R
> -------------------
>
>                 Key: SPARK-6817
>                 URL: https://issues.apache.org/jira/browse/SPARK-6817
>             Project: Spark
>          Issue Type: New Feature
>          Components: SparkR, SQL
>            Reporter: Shivaram Venkataraman
>
> This depends on some internal interface of Spark SQL, should be done after merging into
Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message