spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cheng Lian <>
Subject Re: [DISCUSS] Extending public API
Date Sun, 23 Feb 2014 09:15:16 GMT
I think SPARK-1063 (PR-503) “Add .sortBy(f) method on RDD” would be a good example. Note
that I’m not saying that this PR is already qualified to be accepted, just take it as an
JIRA issue:
GitHub PR:

On Feb 23, 2014, at 2:23 PM, Amandeep Khurana <> wrote:

> Mridul,
> Can you give examples of APIs that people have contributed (or wanted
> to contribute) but you categorize as something that would go into
> piggybank-like (sparkbank)? Curious to know how you'd decide what
> should go where.
> Amandeep
>> On Feb 22, 2014, at 10:06 PM, Mridul Muralidharan <> wrote:
>> Hi,
>> Over the past few months, I have seen a bunch of pull requests which have
>> extended spark api ... most commonly RDD itself.
>> Most of them are either relatively niche case of specialization (which
>> might not be useful for most cases) or idioms which can be expressed
>> (sometimes with minor perf penalty) using existing api.
>> While all of them have non zero value (hence the effort to contribute, and
>> gladly welcomed !) they are extending the api in nontrivial ways and have a
>> maintenance cost ... and we already have a pending effort to clean up our
>> interfaces prior to 1.0
>> I believe there is a need to keep exposed api succint, expressive and
>> functional in spark; while at the same time, encouraging extensions and
>> specialization within spark codebase so that other users can benefit from
>> the shared contributions.
>> One approach could be to start something akin to piggybank in pig to
>> contribute user generated specializations, helper utils, etc : bundled as
>> part of spark, but not part of core itself.
>> Thoughts, comments ?
>> Regards,
>> Mridul

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message