spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From holdenk <>
Subject [GitHub] spark pull request: [SPARK-7675][ML][PYSpark] sparkml params type ...
Date Tue, 10 Nov 2015 01:41:52 GMT
GitHub user holdenk opened a pull request:

    [SPARK-7675][ML][PYSpark] sparkml params type conversion

    From JIRA:
    Currently, PySpark wrappers for Scala classes are brittle when accepting Param
types. E.g., Normalizer's "p" param cannot be set to "2" (an integer); it must be set to "2.0"
(a float). Fixing this is not trivial since there does not appear to be a natural place to
insert the conversion before Python wrappers call Java's Params setter method.
    A possible fix will be to include a method "_checkType" to PySpark's Param class which
checks the type, prints an error if needed, and converts types when relevant (e.g., int to
float, or scipy matrix to array). The Java wrapper method which copies params to Scala can
call this method when available.
    This fix instead checks the types at set time since I think failing sooner is better,
but I can switch it around to check at copy time if that would be better. So far this only
converts int to float and other conversions (like scipymatrix to array) are left for the future.

You can merge this pull request into a Git repository by running:

    $ git pull SPARK-7675-PySpark-sparkml-Params-type-conversion

Alternatively you can review and apply these changes as the patch at:

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9581
commit ff688c06fbcde6fa5f050e0c1d4a857b2323e0da
Author: Holden Karau <>
Date:   2015-11-06T19:48:37Z

    Start work on adding some basic type information so we can handle ints showing up and
convert them to floats for ml's params

commit 2c4aea51c2f2ab04b8b5288c722a81e826dadc82
Author: Holden Karau <>
Date:   2015-11-06T19:54:31Z

    Explicitly specify no default params and take types of of decission tree params for now

commit c6a819adbb884fcb3d258e905ff927b6b0d51fa9
Author: Holden Karau <>
Date:   2015-11-06T21:29:53Z

    re-generate and fix how we were formatting the type names

commit b22b11233277e89c51692958c58a9735e043b483
Author: Holden Karau <>
Date:   2015-11-08T06:11:42Z

    Merge branch 'master' into SPARK-7675-PySpark-sparkml-Params-type-conversion

commit 26cda87d64ad0c92772f41938200691c28f7e1b2
Author: Holden Karau <>
Date:   2015-11-09T23:04:41Z

    Update a bit

commit d35d6dbf2225e346f2191cfe92552c4d77fb7d95
Author: Holden Karau <>
Date:   2015-11-09T23:06:20Z

    Merge branch 'master' into SPARK-7675-PySpark-sparkml-Params-type-conversion

commit fd876c2c91d398ff4afcb4fffe7a1120e9999582
Author: Holden Karau <>
Date:   2015-11-10T01:19:47Z

    Some quick progress

commit cc1ad2dec5e21a079b004b9b683ee5dc850b4c11
Author: Holden Karau <>
Date:   2015-11-10T01:33:06Z

    Switch to strings

commit 9138fba068f2eac34419f4e9d95fdf47fc6d72ab
Author: Holden Karau <>
Date:   2015-11-10T01:38:48Z

    pep8 fixes


If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at or file a JIRA ticket
with INFRA.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message