Mailing-List: contact dev-help@hama.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hama.apache.org
Date: Thu, 18 Jun 2015 03:53:00 +0000 (UTC)
From: "Edward J. Yoon (JIRA)" <jira@apache.org>
To: dev@hama.apache.org
Message-ID: <JIRA.12838155.1434460081000.105074.1434599580735@Atlassian.JIRA>
In-Reply-To: <JIRA.12838155.1434460081000@Atlassian.JIRA>
References: <JIRA.12838155.1434460081000@Atlassian.JIRA>
 <JIRA.12838155.1434460081168@arcas>
Subject: [jira] [Commented] (HAMA-961) Parameter Server for large scale MLP
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HAMA-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14591195#comment-14591195 ] 

Edward J. Yoon commented on HAMA-961:
-------------------------------------

I attach the technical discussion about this with Yexi Jiang here.

{code}
Yes, partition the model is more difficult than partition the data. Yes, you are correct, the traditional way is to use the parameter server to store all the parameters. 

Regards,
Yexi


2015-06-17 17:45 GMT-07:00 Edward J. Yoon <edwardyoon@apache.org>:
Thanks,

> I implemented the distributed version. The data is partitioned while the
> model itself cannot be partitioned (each node will have a copy of the
> model). In each iteration, the computation is conducted on each node and a
> final aggregation is conducted in one node. Then the updated model will be
> synchronized to each node.

This means that current implementation only can do data parallel, not
model parallel. Right?

For model parallelism, I'm thinking about using external parameter
server and multi-threading like this -
https://docs.google.com/drawings/d/1cjz50sGbpnFp2oab30cZ5MNYsaD3PtaBRVsUWuLiglI/edit?usp=sharing

Do you think this is make sense?

On Wed, Jun 17, 2015 at 11:10 PM, Yexi Jiang <yexijiang@gmail.com> wrote:
> Hi, Edward,
>
> I implemented the distributed version. The data is partitioned while the
> model itself cannot be partitioned (each node will have a copy of the
> model). In each iteration, the computation is conducted on each node and a
> final aggregation is conducted in one node. Then the updated model will be
> synchronized to each node.
>
> The model is designed in a hierarchical way. The base class is more abstract
> than the derived class. The ann and perceptron are somewhat similar, but the
> ann package is more flexible. This is because the structure of the ann model
> can be freely set by the user, as long as it is a layered model. Therefore,
> the perceptron, auto-encoder, linear regressor, logistic regressor can all
> be uniformly represented by an ANN.
>
> Regards,
> Yexi
>
> 2015-06-17 1:00 GMT-07:00 Edward J. Yoon <edwardyoon@apache.org>:
>>
>> Hello,
>>
>> I'm recently reading closely your two patches HAMA-681 and HAMA-770
>> again (I'd like to improve and finish it), and have some questions.
>>
>> 1) You said you'll implement the BSP training algorithm based on the
>> non-distributed version[1]. But, it works in parallel with multiple
>> tasks. What's the exact meaning of "non-distributed"?
>>
>> 2) The ann and perceptron packages are somewhat similar together. The
>> perceptron package is now useless? and according to comments,
>> SmallMLPTrainer uses mini SGD but SmallLayeredNeuralNetworkTrainer
>> uses GD?
>>
>> Thanks!
>>
>> [1].
>> https://issues.apache.org/jira/browse/HAMA-770?focusedCommentId=13724503&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13724503
>>
>> --
>> Best Regards, Edward J. Yoon
>
>
>
>
> --
> ------
> Yexi Jiang,
> Homepage: http://yxjiang.github.io/
>


--
Best Regards, Edward J. Yoon
{code}

> Parameter Server for large scale MLP
> ------------------------------------
>
>                 Key: HAMA-961
>                 URL: https://issues.apache.org/jira/browse/HAMA-961
>             Project: Hama
>          Issue Type: Improvement
>          Components: machine learning
>    Affects Versions: 0.7.0
>            Reporter: Edward J. Yoon
>            Assignee: Edward J. Yoon
>             Fix For: 0.8.0
>
>
> I've recently started to review the MLP source codes closely, and I'm thinking about some improvement and API refactoring e.g., APIs for user-defined neuron and synapse models, data structure, ..., etc.
> This issue is one of them, and related to train large models. I'm considering distributed parameter server (http://parameterserver.org) for managing parameters. 


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)