horn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HORN-27) Effective Parallel Training of Large Deep DropConnect Neural Networks
Date Wed, 01 Jun 2016 10:54:59 GMT

     [ https://issues.apache.org/jira/browse/HORN-27?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Edward J. Yoon updated HORN-27:
    Issue Type: Task  (was: Bug)

> Effective Parallel Training of Large Deep DropConnect Neural Networks
> ---------------------------------------------------------------------
>                 Key: HORN-27
>                 URL: https://issues.apache.org/jira/browse/HORN-27
>             Project: Apache Horn
>          Issue Type: Task
>            Reporter: Edward J. Yoon
> As you already might know, training a large scale deep ANN architectures, such as Convolutional
Neural Nets (CNNs) and Recurrent Neural Nets (RNNs), is challenging because the training process
not only involves how to parallelize the training of large models but also it can be quite
prone to over fitting due to large size of the network, even with large data sets. There are
popular techniques for regularizing artificial neural networks, called DropOut [1] and DropConnect
[2], by randomly dropping out hidden units and its connections during training of neural networks.

> This is exactly why we doing this project.  Of course at the moment, it's rough idea,
I'm thinking about ensemble concept of drop out and connect which allows distributed parallel
training with small communication requirements. The core idea of this is to create many model
replicas on different subsets of the data, and partition each network model into multiple
processors randomly, thus dropping connections and achieving locality of computation at the
same time.
> There have been already attempts to parallelize SGD-based training for large-scale deep
learning models on distributed systems. Its basic concept is that each worker trains a copy
of the model and combines their results synchronously, or updates through a centralized parameter
server in asynchronous way. For large model, it generally uses layer-wise model parallelism
based on matrix operations. However, this leads to a large communication overhead between
host and device, or between hosts or devices (like blow image).
> !https://4.bp.blogspot.com/-S6-akP8wGOE/V0eU9DrzESI/AAAAAAAAF-o/qAKZ08VgJDo9ZPJFHt1SXnfZ26yueBY2gCLcB/s640/modelparallel.png!
> Differently, my basic approach is as follows: we assign the training data and model copy
into a number of worker groups. Then, each group divides a large model irregularly into few
disconnected sub-model of the parent model so that each worker runs independently of each

This message was sent by Atlassian JIRA

View raw message