spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Pentreath (JIRA)" <>
Subject [jira] [Commented] (SPARK-16365) Ideas for moving "mllib-local" forward
Date Fri, 08 Jul 2016 19:49:11 GMT


Nick Pentreath commented on SPARK-16365:

Good question - and part of the reason for getting discussion going here. In general (IMO)
the short answer is "no" - I think Spark should be the tool for training models on moderately
large to extremely large datasets, but not necessarily for completely general machine learning.

I think the idea behind {{mllib-local}} is potentially two-fold: (i) make it easier to use
Spark models / pipelines in production scenarios, and (ii) enhance linalg primitives available
to devs / users.

> Ideas for moving "mllib-local" forward
> --------------------------------------
>                 Key: SPARK-16365
>                 URL:
>             Project: Spark
>          Issue Type: Brainstorming
>          Components: ML
>            Reporter: Nick Pentreath
> Since SPARK-13944 is all done, we should all think about what the "next steps" might
be for {{mllib-local}}. E.g., it could be "improve Spark's linear algebra", or "investigate
how we will implement local models/pipelines in Spark", etc.
> This ticket is for comments, ideas, brainstormings and PoCs. The separation of linalg
into a standalone project turned out to be significantly more complex than originally expected.
So I vote we devote sufficient discussion and time to planning out the next move :)

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message