madlib-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [madlib] fmcquillan99 edited a comment on pull request #518: DL: [AutoML] Add support for Hyperopt on top of MOP for MPP + AutoML best-so-far
Date Mon, 05 Oct 2020 17:27:02 GMT

fmcquillan99 edited a comment on pull request #518:
URL: https://github.com/apache/madlib/pull/518#issuecomment-699189947


   (6)
   fmin definition
   https://github.com/hyperopt/hyperopt/wiki/FMin
   fmin(loss, space, algo, max_evals)
   
   Looks like this PR is setting max_evals =  num_models/num_segments in `get_configs_list()`.
 For one thing I'm not sure that 
   ```
   self.num_workers = get_seg_number() * get_segments_per_host()
   ```
   gives total number of workers?  On a 1 host, 2 segments-per-host database this returned
4 instead of the expected 2.  Also this needs to be consistent with the distribution rules
set in the mini-batch preprocessor.
   
   (7)
   The function `get_configs_list()` may not be distributing the workload correctly to the
segments.  
   For 
   ```
       num_models = 3
       num_workers = 3
   ```
   it returns `[(1, 3)]` which seems OK.  But for 
   ```
       num_models = 5
       num_workers = 3
   ```
   it returns `[(1, 5)]` which does not seem OK.  It will not do any hyperopt updates if all
5 configs are grouped together.  I would have expected `[(1, 3), (4, 5)]`.   For 
   ```
       num_models = 20
       num_workers = 3
   ```
   it returns `[(1, 4), (5, 8), (9, 11), (12, 14), (15, 17), (18, 20)]` which means it is
running 4 configs on 3 segments multiple times which does not seem efficient.  I would have
expected something like `[(1, 3), (4, 6), (7, 9), (10, 12), (13, 15), (16, 18),(19, 20)]`
which runs 3 configs at a time on the 3 segments.
   
   (8)
   In `find_hyperopt_config()` confirm that the loss for *each* model that passed to hyperopt
after training, and not just the best one from the group.
   
   (9)
   defaults
   Seems like hyperband defaults are being used for hyperopt in the case that use does not
specify hyperband is not specified.  That will probably throw an error.  Should there be defaults
for hyperband and hyperopt, or should they be mandatory?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



Mime
View raw message