spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joseph Bradley <jos...@databricks.com>
Subject Re: [mllib] Which is the correct package to add a new algorithm?
Date Fri, 28 Nov 2014 20:15:55 GMT
Hi Yu,
Thanks for bringing it up for clarification.  Here's a rough draft of a
section for the soon-to-be-updated programming guide, which will have more
info on the spark.ml package.
Joseph

## spark.mllib vs. spark.ml

Spark 1.2 will include a new machine learning package called spark.ml,
currently an alpha component but potentially a successor to spark.mllib.
The spark.ml package aims to replace the old APIs with a cleaner, more
uniform set of APIs which will help users create full machine learning
pipelines.

(More info about pipelines will be included in the updated programming
guide for Spark 1.2.)

### Development plan

With Spark 1.2, spark.mllib is still the primary machine learning package,
and spark.ml is an alpha component for testing the new API.  The primary
parts of this API are:
* the Pipeline concept for constructing complicated ML workflows consisting
of Estimators and Transformers,
* SchemaRDD as an ML dataset,
* and constructs for specifying parameters for algorithms and pipelines.

If all goes well, spark.ml will become the primary ML package at the time
of the Spark 1.3 release.  Initially, simple wrappers will be used to port
algorithms to spark.ml, but eventually, code will be moved to spark.ml and
spark.mllib will be deprecated.

### Advice to developers

During the next development cycle, new algorithms should be contributed to
spark.mllib.  Optionally, wrappers for new (and old) algorithms can be
contributed to spark.ml.

Users will be able to use algorithms from either of the two packages; the
only difficulty will be the differences in APIs between the two packages.


On Thu, Nov 27, 2014 at 6:41 AM, Yu Ishikawa <yuu.ishikawa+spark@gmail.com>
wrote:

> Hi all,
>
> Spark ML alpha version exists in the current master branch on Github.
> If we want to add new machine learning algorithms or to modify algorithms
> which already exists,
> which package should we implement them at org.apache.spark.mllib or
> org.apache.spark.ml?
>
> thanks,
> Yu
>
>
>
> -----
> -- Yu Ishikawa
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Which-is-the-correct-package-to-add-a-new-algorithm-tp9540.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message