spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiangrui Meng (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-7443) MLlib 1.4 QA plan
Date Mon, 18 May 2015 15:55:06 GMT

     [ https://issues.apache.org/jira/browse/SPARK-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Xiangrui Meng updated SPARK-7443:
---------------------------------
    Description: 
TODO: create JIRAs for each task and assign them accordingly.

h2. API

* Check API compliance using java-compliance-checker (SPARK-7458)

* Audit new public APIs (from the generated html doc)
** Scala (do not forget to check the object doc) (SPARK-7537)
** Java compatibility (SPARK-7529)
** Python API coverage (SPARK-7536)

* audit Pipeline APIs (SPARK-7535)

* graduate spark.ml from alpha
** remove AlphaComponent annotations
** remove mima excludes for spark.ml
** mark concrete classes final wherever reasonable

h2. Algorithms and performance

*Performance*
* _List any other missing performance tests from spark-perf here_
* LDA online/EM (SPARK-7455)
* ElasticNet for linear regression and logistic regression (SPARK-7456)
* Bernoulli naive Bayes (SPARK-7453)
* PIC (SPARK-7454)
* ALS.recommendAll (SPARK-7457)
* perf-tests in Python (SPARK-7539)

*Correctness*
* PMML
** scoring using PMML evaluator vs. MLlib models (SPARK-7540)
* model save/load (SPARK-7541)

h2. Documentation and example code

* Create JIRAs for the user guide to each new algorithm and assign them to the corresponding
author.  Link here as "requires"
** Now that we have algorithms in spark.ml which are not in spark.mllib, we should start making
subsections for the spark.ml API as needed.  We can follow the structure of the spark.mllib
user guide.
*** The spark.ml user guide can provide: (a) code examples and (b) info on algorithms which
do not exist in spark.mllib.
*** We should not duplicate info in the spark.ml guides.  Since spark.mllib is still the primary
API, we should provide links to the corresponding algorithms in the spark.mllib user guide
for more info.

* Create example code for major components.  Link here as "requires"
** cross validation in python
** pipeline with complex feature transformations (scala/java/python)
** elastic-net (possibly with cross validation)
** kernel density

  was:
TODO: create JIRAs for each task and assign them accordingly.

h2. API

* Check API compliance using java-compliance-checker (SPARK-7458)

* Audit new public APIs (from the generated html doc)
** Scala (do not forget to check the object doc) (SPARK-7537)
** Java compatibility (SPARK-7529)
** Python API coverage (SPARK-7536)

* audit Pipeline APIs (SPARK-7535)

* graduate spark.ml from alpha
** remove AlphaComponent annotations
** remove mima excludes for spark.ml
** mark concrete classes final wherever reasonable

h2. Algorithms and performance

*Performance*
* _List any other missing performance tests from spark-perf here_
* LDA online/EM (SPARK-7455)
* ElasticNet for linear regression and logistic regression (SPARK-7456)
* Bernoulli naive Bayes (SPARK-7453)
* PIC (SPARK-7454)
* ALS.recommendAll (SPARK-7457)
* perf-tests in Python (SPARK-7539)

*Correctness*
* PMML
** scoring using PMML evaluator vs. MLlib models (SPARK-7540)
* model save/load (SPARK-7541)

h2. Documentation and example code

* Create JIRAs for the user guide to each new algorithm and assign them to the corresponding
author.  Link here as "requires"
** Now that we have algorithms in spark.ml which are not in spark.mllib, we should start making
subsections for the spark.ml API as needed.  We can follow the structure of the spark.mllib
user guide.
*** The spark.ml user guide can provide: (a) code examples and (b) info on algorithms which
do not exist in spark.mllib.
*** We should not duplicate info in the spark.ml guides.  Since spark.mllib is still the primary
API, we should provide links to the corresponding algorithms in the spark.mllib user guide
for more info.

* Create example code for major components.  Link here as "requires"
** cross validation in python
** pipeline with complex feature transformations (scala/java/python)
** elastic-net (possibly with cross validation)


> MLlib 1.4 QA plan
> -----------------
>
>                 Key: SPARK-7443
>                 URL: https://issues.apache.org/jira/browse/SPARK-7443
>             Project: Spark
>          Issue Type: Umbrella
>          Components: ML, MLlib
>    Affects Versions: 1.4.0
>            Reporter: Xiangrui Meng
>            Assignee: Joseph K. Bradley
>            Priority: Critical
>
> TODO: create JIRAs for each task and assign them accordingly.
> h2. API
> * Check API compliance using java-compliance-checker (SPARK-7458)
> * Audit new public APIs (from the generated html doc)
> ** Scala (do not forget to check the object doc) (SPARK-7537)
> ** Java compatibility (SPARK-7529)
> ** Python API coverage (SPARK-7536)
> * audit Pipeline APIs (SPARK-7535)
> * graduate spark.ml from alpha
> ** remove AlphaComponent annotations
> ** remove mima excludes for spark.ml
> ** mark concrete classes final wherever reasonable
> h2. Algorithms and performance
> *Performance*
> * _List any other missing performance tests from spark-perf here_
> * LDA online/EM (SPARK-7455)
> * ElasticNet for linear regression and logistic regression (SPARK-7456)
> * Bernoulli naive Bayes (SPARK-7453)
> * PIC (SPARK-7454)
> * ALS.recommendAll (SPARK-7457)
> * perf-tests in Python (SPARK-7539)
> *Correctness*
> * PMML
> ** scoring using PMML evaluator vs. MLlib models (SPARK-7540)
> * model save/load (SPARK-7541)
> h2. Documentation and example code
> * Create JIRAs for the user guide to each new algorithm and assign them to the corresponding
author.  Link here as "requires"
> ** Now that we have algorithms in spark.ml which are not in spark.mllib, we should start
making subsections for the spark.ml API as needed.  We can follow the structure of the spark.mllib
user guide.
> *** The spark.ml user guide can provide: (a) code examples and (b) info on algorithms
which do not exist in spark.mllib.
> *** We should not duplicate info in the spark.ml guides.  Since spark.mllib is still
the primary API, we should provide links to the corresponding algorithms in the spark.mllib
user guide for more info.
> * Create example code for major components.  Link here as "requires"
> ** cross validation in python
> ** pipeline with complex feature transformations (scala/java/python)
> ** elastic-net (possibly with cross validation)
> ** kernel density



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message