spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Patrick Wendell (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-7230) Make RDD API private in SparkR for Spark 1.4
Date Thu, 30 Apr 2015 01:14:06 GMT

    [ https://issues.apache.org/jira/browse/SPARK-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520635#comment-14520635
] 

Patrick Wendell edited comment on SPARK-7230 at 4/30/15 1:13 AM:
-----------------------------------------------------------------

Yes - removing API's is really difficult for existing users. That's why the proposal does
limit the number of exposed API's, because users of Spark have an expectation they will be
supported. Part of merging into the upstream project is looking at which API's the commitership
are comfortable supporting in the long term. As it stands, there isn't yet widespread support
in the committership for supporting low level ETL code in R in the long term. We'd rather
have narrower and simpler API's and add more enhancements over time according to user demand.

Of course we'll make a good faith effort to support API's that are useful to existing projects.


was (Author: pwendell):
Yes - removing API's is really difficult for existing users. That's why the proposal does
limit the number of exposed API's, because users of Spark have an exception they will be supported.
Part of merging into the upstream project is looking at which API's the commitership are comfortable
supporting in the long term. As it stands, there isn't yet widespread support in the committership
for supporting low level ETL code in R in the long term. We'd rather have narrower and simpler
API's and add more enhancements over time according to user demand.

Of course we'll make a good faith effort to support API's that are useful to existing projects.

> Make RDD API private in SparkR for Spark 1.4
> --------------------------------------------
>
>                 Key: SPARK-7230
>                 URL: https://issues.apache.org/jira/browse/SPARK-7230
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SparkR
>    Affects Versions: 1.4.0
>            Reporter: Shivaram Venkataraman
>            Assignee: Shivaram Venkataraman
>            Priority: Critical
>
> This ticket proposes making the RDD API in SparkR private for the 1.4 release. The motivation
for doing so are discussed in a larger design document aimed at a more top-down design of
the SparkR APIs. A first cut that discusses motivation and proposed changes can be found at
http://goo.gl/GLHKZI
> The main points in that document that relate to this ticket are:
> - The RDD API requires knowledge of the distributed system and is pretty low level. This
is not very suitable for a number of R users who are used to more high-level packages that
work out of the box.
> - The RDD implementation in SparkR is not fully robust right now: we are missing features
like spilling for aggregation, handling partitions which don't fit in memory etc. There are
further limitations like lack of hashCode for non-native types etc. which might affect user
experience.
> The only change we will make for now is to not export the RDD functions as public methods
in the SparkR package and I will create another ticket for discussing more details public
API for 1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message