systemml-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Dusenberry (JIRA)" <>
Subject [jira] [Commented] (SYSTEMML-1159) Enable Remote Hyperparameter Tuning
Date Mon, 17 Jul 2017 20:01:03 GMT


Mike Dusenberry commented on SYSTEMML-1159:

[~return_01]  Thanks–adding HogWild asynchronous SGD would be quite interesting.  However,
this particular JIRA issue is referring to *hyperparameters* rather than the model parameters,
the latter of which HogWild is applicable.  If you are interested in pursuing the addition
of support for HogWild, could you please create a new JIRA issue for it, and link it to SYSTEMML-540?
 SYSTEMML-1563 may also be of interest -- I added a distributed synchronous SGD algorithm,
implemented currently in [distributed MNIST LeNet |]

> Enable Remote Hyperparameter Tuning
> -----------------------------------
>                 Key: SYSTEMML-1159
>                 URL:
>             Project: SystemML
>          Issue Type: Improvement
>    Affects Versions: SystemML 1.0
>            Reporter: Mike Dusenberry
>            Priority: Blocker
> Training a parameterized machine learning model (such as a large neural net in deep learning)
requires learning a set of ideal model parameters from the data, as well as determining appropriate
hyperparameters (or "settings") for the training process itself.  In the latter case, the
hyperparameters (i.e. learning rate, regularization strength, dropout percentage, model architecture,
etc.) can not be learned from the data, and instead are determined via a search across a space
for each hyperparameter.  For large numbers of hyperparameters (such as in deep learning models),
the current literature points to performing staged, randomized grid searches over the space
to produce distributions of performance, narrowing the space after each search \[1].  Thus,
for efficient hyperparameter optimization, it is desirable to train several models in parallel,
with each model trained over the full dataset.  For deep learning models, a mini-batch training
approach is currently state-of-the-art, and thus separate models with different hyperparameters
could, conceivably, be easily trained on each of the nodes in a cluster.
> In order to allow for the training of deep learning models, SystemML needs to determine
a solution to enable this scenario with the Spark backend.  Specifically, if the user has
a {{train}} function that takes a set of hyperparameters and trains a model with a mini-batch
approach (and thus is only making use of single-node instructions within the function), the
user should be able to wrap this function with, for example, a remote {{parfor}} construct
that samples hyperparameters and calls the {{train}} function on each machine in parallel.
> To be clear, each model would need access to the entire dataset, and each model would
be trained independently.
> \[1]:

This message was sent by Atlassian JIRA

View raw message