spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From BryanCutler <...@git.apache.org>
Subject [GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...
Date Thu, 10 Aug 2017 18:22:07 GMT
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17849#discussion_r132530521
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -417,6 +417,54 @@ def test_logistic_regression_check_thresholds(self):
                 LogisticRegression, threshold=0.42, thresholds=[0.5, 0.5]
             )
     
    +    @staticmethod
    +    def check_params(test_self, py_stage, check_params_exist=True):
    +        """
    +        Checks common requirements for Params.params:
    +          - set of params exist in Java and Python and are ordered by names
    +          - param parent has the same UID as the object's UID
    +          - default param value from Java matches value in Python
    +          - optionally check if all params from Java also exist in Python
    +        """
    +        py_stage_str = "%s %s" % (type(py_stage), py_stage)
    +        if not hasattr(py_stage, "_to_java"):
    +            return
    +        java_stage = py_stage._to_java()
    +        if java_stage is None:
    +            return
    +        test_self.assertEqual(py_stage.uid, java_stage.uid(), msg=py_stage_str)
    +        if check_params_exist:
    +            param_names = [p.name for p in py_stage.params]
    +            java_params = list(java_stage.params())
    +            java_param_names = [jp.name() for jp in java_params]
    +            test_self.assertEqual(
    +                param_names, sorted(java_param_names),
    +                "Param list in Python does not match Java for %s:\nJava = %s\nPython
= %s"
    +                % (py_stage_str, java_param_names, param_names))
    --- End diff --
    
    I also changed the return to continue on line 454, this loop is checking all params so
it was meant to skip over random seed params - not break out of the loop entirely (this is
why that default value for MLP was missed).  I cleaned up the NaN checks, before it was just
checking for Imputer params, but it should be the same for any params with NaN's as default
values.  This is lines 460-462


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message