spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiangrui Meng (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-8445) MLlib 1.5 Roadmap
Date Tue, 23 Jun 2015 22:01:42 GMT

     [ https://issues.apache.org/jira/browse/SPARK-8445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Xiangrui Meng updated SPARK-8445:
---------------------------------
    Description: 
We expect to see many MLlib contributors for the 1.5 release. To scale out the development,
we created this master list for MLlib features we plan to have in Spark 1.5. Due to limited
review bandwidth, features appearing on this list will get higher priority for code review.
But feel free to suggest new items to the list in comments. We are experimenting with this
process. Your feedback would be greatly appreciated.

h1. Instructions

h2. For contributors:

* Please read https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark carefully.
Code style, documentation, and unit tests are important.
* If you are a first-time Spark contributor, please always start with a starter task (TODO:
add a link) rather than a medium/big feature. Based on our experience, mixing the development
process with a big feature usually causes long delay in code review.
* Never work silently. Let everyone know on the corresponding JIRA page when you start working
on some features. This is to avoid duplicate work. For small features, you don't need to wait
to get JIRA assigned.
* For medium/big features or features with dependencies, please get assigned first before
coding and keep the ETA updated on the JIRA. If there exist no activity on the JIRA page for
a certain amount of time, the JIRA should be released for other contributors.
* Do not claim multiple (>3) JIRAs at the same time. Try to finish them one after another.
* Please review others' PRs (https://spark-prs.appspot.com/#mllib). Code review greatly helps
improve others' code as well as yours.

h2. For committers:

* Try to break down big features into small and specific JIRA tasks and link them properly.
* Add "starter" label to starter tasks.
* Put a rough estimate for medium/big features and track the progress.

h1. Roadmap (WIP)

h2. Algorithms and performance

* LDA improvements (SPARK-5572)
* Log-linear model for survival analysis (SPARK-8518)
* Improve GLM's scalability on number of features (SPARK-8520)

h2. Pipeline API

* more feature transformers (SPARK-8521)
* k-means (SPARK-7898)
* naive Bayes

h2. Model persistence

* more PMML export (SPARK-8545)
* model save/load (SPARK-4587)
* pipeline persistence (SPARK-6725)

h2. Python API for ML

h2. SparkR API for ML

h2. Documentation


  was:
We expect to see many MLlib contributors for the 1.5 release. To scale out the development,
we created this master list for MLlib features we plan to have in Spark 1.5. Due to limited
review bandwidth, features appearing on this list will get higher priority for code review.
But feel free to suggest new items to the list in comments. We are experimenting with this
process. Your feedback would be greatly appreciated.

h1. Instructions

h2. For contributors:

* Please read https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark carefully.
Code style, documentation, and unit tests are important.
* If you are a first-time Spark contributor, please always start with a starter task (TODO:
add a link) rather than a medium/big feature. Based on our experience, mixing the development
process with a big feature usually causes long delay in code review.
* Never work silently. Let everyone know on the corresponding JIRA page when you start working
on some features. This is to avoid duplicate work. For small features, you don't need to wait
to get JIRA assigned.
* For medium/big features or features with dependencies, please get assigned first before
coding and keep the ETA updated on the JIRA. If there exist no activity on the JIRA page for
a certain amount of time, the JIRA should be released for other contributors.
* Do not claim multiple (>3) JIRAs at the same time. Try to finish them one after another.
* Please review others' PRs (https://spark-prs.appspot.com/#mllib). Code review greatly helps
improve others' code as well as yours.

h2. For committers:

* Try to break down big features into small and specific JIRA tasks and link them properly.
* Add "starter" label to starter tasks.
* Put a rough estimate for medium/big features and track the progress.

h1. Roadmap

h2. Algorithms

h2. Pipeline API

h2. Model persistence

h2. Python API for ML

h2. SparkR API for ML

h2. Documentation



> MLlib 1.5 Roadmap
> -----------------
>
>                 Key: SPARK-8445
>                 URL: https://issues.apache.org/jira/browse/SPARK-8445
>             Project: Spark
>          Issue Type: Umbrella
>          Components: ML, MLlib
>    Affects Versions: 1.5.0
>            Reporter: Xiangrui Meng
>            Assignee: Xiangrui Meng
>            Priority: Critical
>
> We expect to see many MLlib contributors for the 1.5 release. To scale out the development,
we created this master list for MLlib features we plan to have in Spark 1.5. Due to limited
review bandwidth, features appearing on this list will get higher priority for code review.
But feel free to suggest new items to the list in comments. We are experimenting with this
process. Your feedback would be greatly appreciated.
> h1. Instructions
> h2. For contributors:
> * Please read https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
carefully. Code style, documentation, and unit tests are important.
> * If you are a first-time Spark contributor, please always start with a starter task
(TODO: add a link) rather than a medium/big feature. Based on our experience, mixing the development
process with a big feature usually causes long delay in code review.
> * Never work silently. Let everyone know on the corresponding JIRA page when you start
working on some features. This is to avoid duplicate work. For small features, you don't need
to wait to get JIRA assigned.
> * For medium/big features or features with dependencies, please get assigned first before
coding and keep the ETA updated on the JIRA. If there exist no activity on the JIRA page for
a certain amount of time, the JIRA should be released for other contributors.
> * Do not claim multiple (>3) JIRAs at the same time. Try to finish them one after
another.
> * Please review others' PRs (https://spark-prs.appspot.com/#mllib). Code review greatly
helps improve others' code as well as yours.
> h2. For committers:
> * Try to break down big features into small and specific JIRA tasks and link them properly.
> * Add "starter" label to starter tasks.
> * Put a rough estimate for medium/big features and track the progress.
> h1. Roadmap (WIP)
> h2. Algorithms and performance
> * LDA improvements (SPARK-5572)
> * Log-linear model for survival analysis (SPARK-8518)
> * Improve GLM's scalability on number of features (SPARK-8520)
> h2. Pipeline API
> * more feature transformers (SPARK-8521)
> * k-means (SPARK-7898)
> * naive Bayes
> h2. Model persistence
> * more PMML export (SPARK-8545)
> * model save/load (SPARK-4587)
> * pipeline persistence (SPARK-6725)
> h2. Python API for ML
> h2. SparkR API for ML
> h2. Documentation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message