spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Rudenko (JIRA)" <>
Subject [jira] [Commented] (SPARK-7131) Move tree,forest implementation from spark.mllib to
Date Wed, 09 Dec 2015 20:52:10 GMT


Peter Rudenko commented on SPARK-7131:

Please remove final classes from RF and GBM models in ml package. I want to extend them, set
some parameters, reimplement some functionality (do probabilistic models for GBC, etc.).

> Move tree,forest implementation from spark.mllib to
> ------------------------------------------------------------
>                 Key: SPARK-7131
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, MLlib
>    Affects Versions: 1.4.0
>            Reporter: Joseph K. Bradley
>            Assignee: Joseph K. Bradley
>             Fix For: 1.5.0
>   Original Estimate: 168h
>  Remaining Estimate: 168h
> We want to change and improve the API for trees and ensembles, but we cannot
change the old API in spark.mllib.  To support the changes we want to make, we should move
the implementation from spark.mllib to  We will generalize and modify it, but will
also ensure that we do not change the behavior of the old API.
> There are several steps to this:
> 1. Copy the implementation over to and change the classes to use that
implementation, rather than calling the spark.mllib implementation.  The current
tests will ensure that the 2 implementations learn exactly the same models.  Note: This should
include performance testing to make sure the updated code does not have any regressions. -->
*UPDATE*: I have run tests using spark-perf, and there were no regressions.
> 2. Remove the spark.mllib implementation, and make the spark.mllib APIs wrappers around
the implementation.  The tests will again ensure that we do not change any
> 3. Move the unit tests to, and change the spark.mllib unit tests to verify model
> This JIRA is now for step 1 only.  Steps 2 and 3 will be in separate JIRAs.
> After these updates, we can more safely generalize and improve the implementation.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message