spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Holden Karau <holden.ka...@gmail.com>
Subject Re: Develop custom Estimator / Transformer for pipeline
Date Thu, 17 Nov 2016 21:28:48 GMT
I've been working on a blog post around this and hope to have it published
early next month 😀

On Nov 17, 2016 10:16 PM, "Joseph Bradley" <joseph@databricks.com> wrote:

Hi Georg,

It's true we need better documentation for this.  I'd recommend checking
out simple algorithms within Spark for examples:
ml.feature.Tokenizer
ml.regression.IsotonicRegression

You should not need to put your library in Spark's namespace.  The shared
Params in SPARK-7146 are not necessary to create a custom algorithm; they
are just niceties.

Though there aren't great docs yet, you should be able to follow existing
examples.  And I'd like to add more docs in the future!

Good luck,
Joseph

On Wed, Nov 16, 2016 at 6:29 AM, Georg Heiler <georg.kf.heiler@gmail.com>
wrote:

> HI,
>
> I want to develop a library with custom Estimator / Transformers for
> spark. So far not a lot of documentation could be found but
> http://stackoverflow.com/questions/37270446/how-to-roll-
> a-custom-estimator-in-pyspark-mllib
>
> Suggest that:
> Generally speaking, there is no documentation because as for Spark 1.6 /
> 2.0 most of the related API is not intended to be public. It should change
> in Spark 2.1.0 (see SPARK-7146
> <https://issues.apache.org/jira/browse/SPARK-7146>).
>
> Where can I already find documentation today?
> Is it true that my library would require residing in Sparks`s namespace
> similar to https://github.com/collectivemedia/spark-ext to utilize all
> the handy functionality?
>
> Kind Regards,
> Georg
>

Mime
View raw message