systemml-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Dusenberry (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SYSTEMML-1563) Add a distributed synchronous SGD MNIST LeNet example
Date Wed, 26 Apr 2017 22:52:04 GMT

    [ https://issues.apache.org/jira/browse/SYSTEMML-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15985693#comment-15985693
] 

Mike Dusenberry commented on SYSTEMML-1563:
-------------------------------------------

cc [~nakul02], [~niketanpansare], [~prithvi_r_s], [~reinwald]

> Add a distributed synchronous SGD MNIST LeNet example
> -----------------------------------------------------
>
>                 Key: SYSTEMML-1563
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1563
>             Project: SystemML
>          Issue Type: Sub-task
>            Reporter: Mike Dusenberry
>            Assignee: Mike Dusenberry
>
> This aims to add a distributed synchronous SGD MNIST LeNet example.  In distributed synchronous
SGD, multiple mini-batches are run forward & backward simultaneously, and the gradients
are aggregated together by addition before the model parameters are updated.  This is mathematically
equivalent to simply using a large mini-batch size, i.e. {{new_mini_batch_size = mini_batch_size
* number_of_parallel_mini_batches}}.  The benefit is that distributed synchronous SGD can
make use of multiple devices, i.e. multiple GPUs or multiple CPU machines, and thus can speed
up training time.  More specifically, using an effectively larger mini-batch size can yield
a more stable gradient in expectation, and a larger number of epochs can be run in the same
amount of time, both of which lead to faster convergence.  Alternatives include various forms
of distributed *asynchronous* SGD, such as Downpour, Hogwild, etc.  However, a recent paper
\[1] from Google Brain / Open AI has found evidence supporting the claim that distributed
synchronous SGD can lead to faster convergence, particularly if it is extending with the notion
of "backup workers" as described in the paper.
> We will first aim for distributed synchronous SGD with no backup workers, and then extend
this to include backup workers.  The MNIST LeNet model will simply serve as an example, and
this same approach can be extended to more recent models, such as resnets.
> \[1]: https://arxiv.org/abs/1604.00981



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message