horn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HORN-2) Umbrella ticket for Implementation Planning of Apache Horn
Date Mon, 09 Nov 2015 02:04:11 GMT

    [ https://issues.apache.org/jira/browse/HORN-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14995931#comment-14995931
] 

Edward J. Yoon commented on HORN-2:
-----------------------------------

NOTE: if we difficult to support both neuron-centric and GPU acceleration on heterogeneous
cluster, we can consider having separate 2 clusters like Tencent's mariana:

{code}
Tencent deep learning platform, which utilizes GPU and CPU cluster to train models parallelly
with three frameworks: 

  1) a multi-GPU data parallelism framework for deep neural networks (DNNs). 
  2) a multi-GPU model parallelism and data parallelism framework for deep convolutional neural
networks (CNNs). 
  3) a CPU cluster framework for large scale DNNs.
{code}

> Umbrella ticket for Implementation Planning of Apache Horn
> ----------------------------------------------------------
>
>                 Key: HORN-2
>                 URL: https://issues.apache.org/jira/browse/HORN-2
>             Project: Apache Horn
>          Issue Type: Wish
>            Reporter: Edward J. Yoon
>
> My old rough idea is described here: http://blog.udanax.org/2015/06/googles-distbelief-clone-project-on.html
> The basic idea of data and model parallelism is use of the remote parameter server to
parallelize model creation and distribute training across machines, and the region barrier
synchronization per task group instead of global barrier synchronization for performing asynchronous
mini-batches within single BSP job.
> Since Apache Hama provides pluggable interface for Synchronization[1], we can easily
create our own region barrier synchronization service for handling multiple BSP worker groups
(Regarding management of Tasks Topology, I have no concrete idea yet).
> Parameter Server requires decision whether to use Legacy open source or implement ourself.
> My rough Programming Inteface Design is only focused on feed-forward networks such as
MLP, CNN, and Autoencoder. We may want to conver everything.
> 1. http://wiki.apache.org/hama/SyncService



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message