spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lucas Partridge (JIRA)" <>
Subject [jira] [Commented] (SPARK-19498) Discussion: Making MLlib APIs extensible for 3rd party libraries
Date Fri, 06 Jul 2018 08:40:00 GMT


Lucas Partridge commented on SPARK-19498:

[~Peter Knight] Ironically Spark itself already auto-generates the boiler-plate code for
its shared params. E.g., see [] .
That's how they end up with shared params at [] .

> Discussion: Making MLlib APIs extensible for 3rd party libraries
> ----------------------------------------------------------------
>                 Key: SPARK-19498
>                 URL:
>             Project: Spark
>          Issue Type: Brainstorming
>          Components: ML
>    Affects Versions: 2.2.0
>            Reporter: Joseph K. Bradley
>            Priority: Critical
> Per the recent discussion on the dev list, this JIRA is for discussing how we can make
MLlib DataFrame-based APIs more extensible, especially for the purpose of writing 3rd-party
libraries with APIs extended from the MLlib APIs (for custom Transformers, Estimators, etc.).
> * For people who have written such libraries, what issues have you run into?
> * What APIs are not public or extensible enough?  Do they require changes before being
made more public?
> * Are APIs for non-Scala languages such as Java and Python friendly or extensive enough?
> The easy answer is to make everything public, but that would be terrible of course in
the long-term.  Let's discuss what is needed and how we can present stable, sufficient, and
easy-to-use APIs for 3rd-party developers.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message