spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joseph Bradley <jos...@databricks.com>
Subject Re: Spark.ml roadmap 2.3.0 and beyond
Date Tue, 20 Mar 2018 21:51:23 GMT
The promised roadmap JIRA: https://issues.apache.org/jira/browse/SPARK-23758

Note it doesn't have much explicitly listed yet, but committers can add
items as they agree to shepherd them.  (Committers, make sure to check what
you're currently listed as shepherding!)  The links for searching can be
useful too.

On Thu, Dec 7, 2017 at 3:55 PM, Stephen Boesch <javadba@gmail.com> wrote:

> Thanks Joseph.  We can wait for post 2.3.0.
>
> 2017-12-07 15:36 GMT-08:00 Joseph Bradley <joseph@databricks.com>:
>
>> Hi Stephen,
>>
>> I used to post those roadmap JIRAs to share instructions for contributing
>> to MLlib and to try to coordinate amongst committers.  My feeling was that
>> the coordination aspect was of mixed success, so I did not post one for
>> 2.3.  I'm glad you pinged about this; if those were useful, then I can plan
>> on posting one for the release after 2.3.  As far as identifying
>> committers' plans, the best option right now is to look for Shepherds in
>> JIRA as well as the few mailing list threads about directions.
>>
>> For myself, I'm mainly focusing on fixing some issues with persistence
>> for custom algorithms in PySpark (done), adding the image schema (done),
>> and using ML Pipelines in Structured Streaming (WIP).
>>
>> Joseph
>>
>> On Wed, Nov 29, 2017 at 6:52 AM, Stephen Boesch <javadba@gmail.com>
>> wrote:
>>
>>> There are several  JIRA's and/or PR's that contain logic the Data
>>> Science teams that I work with use in their local models. We are trying to
>>> determine if/when these features may gain traction again.  In at least one
>>> case all of the work were done but the shepherd said that getting it
>>> committed were of lower priority than other tasks - one specifically
>>> mentioned was the mllib/ml parity that has been ongoing for nearly three
>>> years.
>>>
>>> In order to prioritize work that the ML platform would do it would be
>>> helpful to know at least which if any of those tasks were going to be moved
>>> ahead by the community: since we could then focus on other ones instead of
>>> duplicating the effort.
>>>
>>> In addition there are some engineering code jam sessions that happen
>>> periodically: knowing which features are actively on the roadmap would *certainly
>>> *influence our selection of work.  The roadmaps from 2.2.0 and earlier
>>> were a very good starting point to understand not just the specific work in
>>> progress - but also the current mindset/thinking of the committers in terms
>>> of general priorities.
>>>
>>> So if the same format of document were not available - then what content *is
>>> *that gives a picture of where spark.ml were headed?
>>>
>>> 2017-11-29 6:39 GMT-08:00 Stephen Boesch <javadba@gmail.com>:
>>>
>>>> Any further information/ thoughts?
>>>>
>>>>
>>>>
>>>> 2017-11-22 15:07 GMT-08:00 Stephen Boesch <javadba@gmail.com>:
>>>>
>>>>> The roadmaps for prior releases e.g. 1.6 2.0 2.1 2.2 were available:
>>>>>
>>>>> 2.2.0 https://issues.apache.org/jira/browse/SPARK-18813
>>>>>
>>>>> 2.1.0 https://issues.apache.org/jira/browse/SPARK-15581
>>>>> ..
>>>>>
>>>>> It seems those roadmaps were not available per se' for 2.3.0 and
>>>>> later? Is there a different mechanism for that info?
>>>>>
>>>>> stephenb
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>>
>> Joseph Bradley
>>
>> Software Engineer - Machine Learning
>>
>> Databricks, Inc.
>>
>> [image: http://databricks.com] <http://databricks.com/>
>>
>
>


-- 

Joseph Bradley

Software Engineer - Machine Learning

Databricks, Inc.

[image: http://databricks.com] <http://databricks.com/>

Mime
View raw message