horn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Jungblut (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HORN-2) Umbrella ticket for Implementation Planning of Apache Horn
Date Sun, 25 Oct 2015 10:41:27 GMT

    [ https://issues.apache.org/jira/browse/HORN-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14973169#comment-14973169

Thomas Jungblut commented on HORN-2:

Maybe we should start by gathering some requirements and voting on them (maybe through confluence/mail).
Let me craft a small list, please extend if possible until we are fully complete.

Use Case (I have data in HDFS, I want to do deep learning)
1) I want my models as fast as possible
2) I want my models as cheap as possible
3) some trade-off between above based on hardware

Targeted Hardware
1) Commodity CPU clusters
2) Commodity GPU clusters
3) HPC hardware with CPUs (e.g. remote memory access, InfiniBand)
4) HPC hardware with GPUs (e.g. remote memory access, InfiniBand)
5) some mixings between these

Computation Engine
1) Running on Hama
2) Running on Spark
3) Running on YARN
4) Agnostic of a computation layer

1) simple multi layer percentrons (up to three layers, single output)
2) GoogLeNet ~ 50 layers, up to three outputs
3) Deeper nets, up to 100 layers, up to 20 outputs
4) RNNs, unfolded ~150 layers, up to 100 outputs
5) something even deeper

Network Definition
1) Neuron centric
2) Layer-wise

NeuralNetwork Framework?
1) no, not needed and we do our own stuff
2) leverage Caffe
3) leverage Torch
4) leverage X

1) SGD (gradient descent on mini batches)
3) Conjugate Gradient
4) any of these

Please add some more of these, I'd like to compile a full list by the end of the coming week
to open a vote on what our requirements are to drive the design.

> Umbrella ticket for Implementation Planning of Apache Horn
> ----------------------------------------------------------
>                 Key: HORN-2
>                 URL: https://issues.apache.org/jira/browse/HORN-2
>             Project: Apache Horn
>          Issue Type: Wish
>            Reporter: Edward J. Yoon
> My old rough idea is described here: http://blog.udanax.org/2015/06/googles-distbelief-clone-project-on.html
> The basic idea of data and model parallelism is use of the remote parameter server to
parallelize model creation and distribute training across machines, and the region barrier
synchronization per task group instead of global barrier synchronization for performing asynchronous
mini-batches within single BSP job.
> Since Apache Hama provides pluggable interface for Synchronization[1], we can easily
create our own region barrier synchronization service for handling multiple BSP worker groups
(Regarding management of Tasks Topology, I have no concrete idea yet).
> Parameter Server requires decision whether to use Legacy open source or implement ourself.
> My rough Programming Inteface Design is only focused on feed-forward networks such as
MLP, CNN, and Autoencoder. We may want to conver everything.
> 1. http://wiki.apache.org/hama/SyncService

This message was sent by Atlassian JIRA

View raw message