systemml-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Dusenberry (JIRA)" <>
Subject [jira] [Created] (SYSTEMML-1760) Improve engine robustness of distributed SGD training
Date Wed, 12 Jul 2017 00:27:00 GMT
Mike Dusenberry created SYSTEMML-1760:

             Summary: Improve engine robustness of distributed SGD training
                 Key: SYSTEMML-1760
             Project: SystemML
          Issue Type: Improvement
            Reporter: Mike Dusenberry
            Assignee: Fei Hu

Currently, we have a mathematical framework in place for training with distributed SGD in
a [distributed MNIST LeNet example |].
 This task aims to push this at scale to determine (1) the current behavior of the engine
(i.e. does the optimizer actually run this in a distributed fashion, and (2) ways to improve
the robustness and performance for this scenario.  The distributed SGD framework from this
example has already been ported into Caffe2DML, and thus improvements made for this task will
directly benefit our efforts towards distributed training of Caffe models (and Keras in the

This message was sent by Atlassian JIRA

View raw message