Mailing-List: contact issues-help@systemml.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@systemml.apache.org
Date: Tue, 23 May 2017 00:03:04 +0000 (UTC)
From: "Mike Dusenberry (JIRA)" <jira@apache.org>
To: issues@systemml.incubator.apache.org
Message-ID: <JIRA.13067193.1493246737000.267913.1495497784072@Atlassian.JIRA>
In-Reply-To: <JIRA.13067193.1493246737000@Atlassian.JIRA>
References: <JIRA.13067193.1493246737000@Atlassian.JIRA> <JIRA.13067193.1493246737251@jira-lw-us.apache.org>
Subject: [jira] [Closed] (SYSTEMML-1563) Add a distributed synchronous SGD
 MNIST LeNet example
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
archived-at: Tue, 23 May 2017 00:03:07 -0000


     [ https://issues.apache.org/jira/browse/SYSTEMML-1563?page=3Dcom.atlas=
sian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Dusenberry closed SYSTEMML-1563.
-------------------------------------

> Add a distributed synchronous SGD MNIST LeNet example
> -----------------------------------------------------
>
>                 Key: SYSTEMML-1563
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1563
>             Project: SystemML
>          Issue Type: Sub-task
>            Reporter: Mike Dusenberry
>            Assignee: Mike Dusenberry
>             Fix For: SystemML 1.0
>
>
> This aims to add a *distributed synchronous SGD* MNIST LeNet example.  In=
 distributed synchronous SGD, multiple mini-batches are run forward & backw=
ard simultaneously, and the gradients are aggregated together by addition b=
efore the model parameters are updated.  This is mathematically equivalent =
to simply using a large mini-batch size, i.e. {{new_mini_batch_size =3D min=
i_batch_size * number_of_parallel_mini_batches}}.  The benefit is that dist=
ributed synchronous SGD can make use of multiple devices, i.e. multiple GPU=
s or multiple CPU machines, and thus can speed up training time.  More spec=
ifically, using an effectively larger mini-batch size can yield a more stab=
le gradient in expectation, and a larger number of epochs can be run in the=
 same amount of time, both of which lead to faster convergence.  Alternativ=
es include various forms of distributed _asynchronous_ SGD, such as Downpou=
r, Hogwild, etc.  However, a recent paper \[1] from Google Brain / Open AI =
has found evidence supporting the claim that distributed synchronous SGD ca=
n lead to faster convergence, particularly if it is extending with the noti=
on of "backup workers" as described in the paper.
> We will first aim for distributed synchronous SGD with no backup workers,=
 and then extend this to include backup workers.  The MNIST LeNet model wil=
l simply serve as an example, and this same approach can be extended to mor=
e recent models, such as ResNets.
> \[1]: https://arxiv.org/abs/1604.00981


--
This message was sent by Atlassian JIRA
(v6.3.15#6346)