spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "yuhao yang (JIRA)" <>
Subject [jira] [Commented] (SPARK-18755) Add Randomized Grid Search to Spark ML
Date Fri, 27 Oct 2017 06:35:00 GMT


yuhao yang commented on SPARK-18755:

Thanks for sending the update here. 

Feel free to send a PR as you wish. I'm interested in the topic and can help with review.
Yet since none of the committers stopped by here, I guess the review process will be very

> Add Randomized Grid Search to Spark ML
> --------------------------------------
>                 Key: SPARK-18755
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>            Reporter: yuhao yang
> Randomized Grid Search  implements a randomized search over parameters, where each setting
is sampled from a distribution over possible parameter values. This has two main benefits
over an exhaustive search:
> 1. A budget can be chosen independent of the number of parameters and possible values.
> 2. Adding parameters that do not influence the performance does not decrease efficiency.
> Randomized Grid search usually gives similar result as exhaustive search, while the run
time for randomized search is drastically lower.
> For more background, please refer to:
> sklearn:
> There're two ways to implement this in Spark as I see:
> 1. Add searchRatio to ParamGridBuilder and conduct sampling directly during build. Only
1 new public function is required.
> 2. Add trait RadomizedSearch and create new class RandomizedCrossValidator and RandomizedTrainValiationSplit,
which can be complicated since we need to deal with the models.
> I'd prefer option 1 as it's much simpler and straightforward. We can support Randomized
grid search via some smallest change.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message