horn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HORN-27) Effective Parallel Training of Large Deep DropConnect Neural Networks
Date Wed, 01 Jun 2016 10:49:59 GMT
Edward J. Yoon created HORN-27:

             Summary: Effective Parallel Training of Large Deep DropConnect Neural Networks
                 Key: HORN-27
                 URL: https://issues.apache.org/jira/browse/HORN-27
             Project: Apache Horn
          Issue Type: Bug
            Reporter: Edward J. Yoon

As you already might know, training a large scale deep ANN architectures, such as Convolutional
Neural Nets (CNNs) and Recurrent Neural Nets (RNNs), is challenging because the training process
not only involves how to parallelize the training of large models but also it can be quite
prone to over fitting due to large size of the network, even with large data sets. There are
popular techniques for regularizing artificial neural networks, called DropOut [1] and DropConnect
[2], by randomly dropping out hidden units and its connections during training of neural networks.

This is just my rough idea at the moment, I'm thinking about ensemble concept of drop out
and connect which allows distributed parallel training with small communication requirements.
The core idea of this is to create many model replicas on different subsets of the data, and
partition each network model into
multiple processors randomly, thus dropping connections and achieving
locality of computation at the same time.

There have been already attempts to parallelize SGD-based training for
large-scale deep learning models on distributed systems. Its basic concept
is that each worker trains a copy of the model and combines their results
synchronously, or updates through a centralized parameter server in
asynchronous way. For large model, it generally uses layer-wise model
parallelism based on matrix operations. However, this leads to a large
communication overhead between host and device, or between hosts or devices.

Differently, my basic approach is as follows: we assign the training data and model copy into
a number of worker groups. Then, each group divides a large model irregularly into few disconnected
sub-model of the parent model so that each worker runs independently of each other.

This message was sent by Atlassian JIRA

View raw message