Return-Path: X-Original-To: apmail-hama-dev-archive@www.apache.org Delivered-To: apmail-hama-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1DEA018C81 for ; Thu, 18 Jun 2015 03:53:01 +0000 (UTC) Received: (qmail 22328 invoked by uid 500); 18 Jun 2015 03:53:00 -0000 Delivered-To: apmail-hama-dev-archive@hama.apache.org Received: (qmail 22293 invoked by uid 500); 18 Jun 2015 03:53:00 -0000 Mailing-List: contact dev-help@hama.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hama.apache.org Delivered-To: mailing list dev@hama.apache.org Received: (qmail 22279 invoked by uid 99); 18 Jun 2015 03:53:00 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Jun 2015 03:53:00 +0000 Date: Thu, 18 Jun 2015 03:53:00 +0000 (UTC) From: "Edward J. Yoon (JIRA)" To: dev@hama.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HAMA-961) Parameter Server for large scale MLP MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HAMA-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14591195#comment-14591195 ] Edward J. Yoon commented on HAMA-961: ------------------------------------- I attach the technical discussion about this with Yexi Jiang here. {code} Yes, partition the model is more difficult than partition the data. Yes, you are correct, the traditional way is to use the parameter server to store all the parameters. Regards, Yexi 2015-06-17 17:45 GMT-07:00 Edward J. Yoon : Thanks, > I implemented the distributed version. The data is partitioned while the > model itself cannot be partitioned (each node will have a copy of the > model). In each iteration, the computation is conducted on each node and a > final aggregation is conducted in one node. Then the updated model will be > synchronized to each node. This means that current implementation only can do data parallel, not model parallel. Right? For model parallelism, I'm thinking about using external parameter server and multi-threading like this - https://docs.google.com/drawings/d/1cjz50sGbpnFp2oab30cZ5MNYsaD3PtaBRVsUWuLiglI/edit?usp=sharing Do you think this is make sense? On Wed, Jun 17, 2015 at 11:10 PM, Yexi Jiang wrote: > Hi, Edward, > > I implemented the distributed version. The data is partitioned while the > model itself cannot be partitioned (each node will have a copy of the > model). In each iteration, the computation is conducted on each node and a > final aggregation is conducted in one node. Then the updated model will be > synchronized to each node. > > The model is designed in a hierarchical way. The base class is more abstract > than the derived class. The ann and perceptron are somewhat similar, but the > ann package is more flexible. This is because the structure of the ann model > can be freely set by the user, as long as it is a layered model. Therefore, > the perceptron, auto-encoder, linear regressor, logistic regressor can all > be uniformly represented by an ANN. > > Regards, > Yexi > > 2015-06-17 1:00 GMT-07:00 Edward J. Yoon : >> >> Hello, >> >> I'm recently reading closely your two patches HAMA-681 and HAMA-770 >> again (I'd like to improve and finish it), and have some questions. >> >> 1) You said you'll implement the BSP training algorithm based on the >> non-distributed version[1]. But, it works in parallel with multiple >> tasks. What's the exact meaning of "non-distributed"? >> >> 2) The ann and perceptron packages are somewhat similar together. The >> perceptron package is now useless? and according to comments, >> SmallMLPTrainer uses mini SGD but SmallLayeredNeuralNetworkTrainer >> uses GD? >> >> Thanks! >> >> [1]. >> https://issues.apache.org/jira/browse/HAMA-770?focusedCommentId=13724503&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13724503 >> >> -- >> Best Regards, Edward J. Yoon > > > > > -- > ------ > Yexi Jiang, > Homepage: http://yxjiang.github.io/ > -- Best Regards, Edward J. Yoon {code} > Parameter Server for large scale MLP > ------------------------------------ > > Key: HAMA-961 > URL: https://issues.apache.org/jira/browse/HAMA-961 > Project: Hama > Issue Type: Improvement > Components: machine learning > Affects Versions: 0.7.0 > Reporter: Edward J. Yoon > Assignee: Edward J. Yoon > Fix For: 0.8.0 > > > I've recently started to review the MLP source codes closely, and I'm thinking about some improvement and API refactoring e.g., APIs for user-defined neuron and synapse models, data structure, ..., etc. > This issue is one of them, and related to train large models. I'm considering distributed parameter server (http://parameterserver.org) for managing parameters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)