spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yinxusen <...@git.apache.org>
Subject [GitHub] spark pull request: [SPARK-14706] Python ML persistence integratio...
Date Fri, 22 Apr 2016 07:04:07 GMT
GitHub user yinxusen opened a pull request:

    https://github.com/apache/spark/pull/12604

    [SPARK-14706] Python ML persistence integration test

    ## What changes were proposed in this pull request?
    
    This patch tests Python ML persistence integration. 
    
    - Add persistency tests of CrossValidator(TrainValidationSplit(LogisticRegression)) and
TrainValidationSplit(CrossValidator(LogisticRegression)).
    
    - Enhance `_compare_pipelines` with checking of CrossValidator, CrossValidatorModel, TrainValidationSplit,
TrainValidationSplitModel, OneVsRest, OneVsRestModel.
    
    **Bugs found and fixed in this PR:**
    
    - OneVsRest, CrossValidator and TrainValidationSplit should have `_transfer_param_map_to_java`
and `_transfer_param_map_from_java`, otherwise they can't be used as estimators in tuning.
    
    - 
    ```scala
    lr = LogisticRegression()
    lr.getThresholds()
    ```
    produces `keyNotFoundError` because thresholds neither be set nor in `_defaultParamMap`,
which leads the previous JavaParams parameter equality check error.
    
    - `trainRatio` in `TrainValidationSplit` should have float type converter.
    
    - `OneVsRest` with `classifier` in `estimatorParamMaps` of tuning fail to persistence.
I.e.
    ```scala
    ovr = OneVsRest()
    epms = [{ovr.classifier: xxxx}, {ovr.classifier: xxx}]
    cv = CrossValidator(estimator=ovr, estimatorParamMaps=epms, ...)
    cv.load()
    ```
    fails because classifier cannot be serialized via JSON.
    
    The last one is not trivial, so I left it unsolved in this PR. 
    
    ## How was this patch tested?
    
    The patch tests with Python unit tests.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yinxusen/spark SPARK-14706

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12604.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12604
    
----
commit 52046d496a9e97cbb948d67ba6f0e78923b30732
Author: Xusen Yin <yinxusen@gmail.com>
Date:   2016-04-19T01:52:58Z

    add tests for meta-algorithms persistence

commit b843673f0ae33c6f347896837e47c9eb37260f19
Author: Xusen Yin <yinxusen@gmail.com>
Date:   2016-04-20T18:48:01Z

    fix none with sqlContext other than sc.parallelize.toDF

commit f5c14d5d9c83eb72e04a360a9941d81b74ca8f3d
Author: yinxusen <yinxusen@gmail.com>
Date:   2016-04-21T18:18:08Z

    add seed in transfer to/from java

commit b62c6ab4e1c8a1135292cabdfa3003d1a65e0965
Author: yinxusen <yinxusen@gmail.com>
Date:   2016-04-21T18:35:34Z

    add CrossValidatorParams and TrainValidationSplit for save/load consistency

commit 73835dd8cac2ced02a9f251f50cbc9457bfe6c41
Author: yinxusen <yinxusen@gmail.com>
Date:   2016-04-21T18:40:18Z

    add transfer param map for TrainValidateSplit

commit 60cfe38c6b8e34c87da3be9767f850cfffe3a55e
Author: yinxusen <yinxusen@gmail.com>
Date:   2016-04-21T20:46:42Z

    add transfer param map for OneVsRest/Model

commit 842e6064b3d66a38fb618bafc88bebe4c1a4f51e
Author: yinxusen <yinxusen@gmail.com>
Date:   2016-04-22T05:58:17Z

    fix cv wraps tvs and tvs wraps cv

commit fa570c663fd07cb520ac8c05f98887e6c0cf4ad2
Author: yinxusen <yinxusen@gmail.com>
Date:   2016-04-22T06:14:35Z

    fix transfer param map for ovr

commit 40d48baaa13c7a014116d4a1845c84adb024b22c
Author: yinxusen <yinxusen@gmail.com>
Date:   2016-04-22T06:18:18Z

    merge with master

commit 622e5647a271e68854d75aafa10972a89585df56
Author: yinxusen <yinxusen@gmail.com>
Date:   2016-04-22T06:35:04Z

    fix style

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message