From dev-return-2373-apmail-systemml-dev-archive=systemml.apache.org@systemml.apache.org Fri Mar 16 11:17:32 2018 Return-Path: X-Original-To: apmail-systemml-dev-archive@minotaur.apache.org Delivered-To: apmail-systemml-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 17D68180D0 for ; Fri, 16 Mar 2018 11:17:32 +0000 (UTC) Received: (qmail 17693 invoked by uid 500); 16 Mar 2018 11:17:32 -0000 Delivered-To: apmail-systemml-dev-archive@systemml.apache.org Received: (qmail 17640 invoked by uid 500); 16 Mar 2018 11:17:31 -0000 Mailing-List: contact dev-help@systemml.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@systemml.apache.org Delivered-To: mailing list dev@systemml.apache.org Received: (qmail 17624 invoked by uid 99); 16 Mar 2018 11:17:31 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Mar 2018 11:17:31 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id D10ABC10C3 for ; Fri, 16 Mar 2018 11:17:30 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.898 X-Spam-Level: * X-Spam-Status: No, score=1.898 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id qWmZOSPtRNpk for ; Fri, 16 Mar 2018 11:17:27 +0000 (UTC) Received: from mail-pf0-f173.google.com (mail-pf0-f173.google.com [209.85.192.173]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id A0C805F1B3 for ; Fri, 16 Mar 2018 11:17:26 +0000 (UTC) Received: by mail-pf0-f173.google.com with SMTP id m68so4016214pfm.11 for ; Fri, 16 Mar 2018 04:17:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=1g0YTxVAgJ1xfUrSufVhcaCDex1btnF+qQk8zhpM//4=; b=TLL/I+0/2gV1xSeOC/t3Bi9uPNnOfFTHJur4ZFsRQ1bWuKWGMenrCyNUtJwscfwIti SR87p0Y96bagyZ4Qfx+ZCiCbdU8zIrg+m0Hvt5K5Tf1cOHsKwlEQ1+THAfhgBvVtHzIO rgrtTbzj60Nr8zbL73tvoDT+91gBLS07y/WPNfHlkEaiI9hACgeFM3s5FsbUGCRwTP0V dpk/lhUWyNRLsRON6+WkwsDdSoy+PtEjFdpPWOt7aObMkT0uXPOQ4mFJOh4MzRpy2M8F GT7RfKijXiaOTsv4rhHYvLqsJkLtbiI2FuGtlWPh9G0jJugIlUc+cnsb//x5PSzMYocJ Q4SA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=1g0YTxVAgJ1xfUrSufVhcaCDex1btnF+qQk8zhpM//4=; b=DROsk1rtLMkVFAl3D7SnPU/ySPFT+8tvSdE5kE5bSb97Du7S0HsemCSlp1gqo3raTb rlpV/kmgeI1qp79KC1b3PzaBPKVs3mgKGxpmzg2lI8oqzTcFzO6Jpb+wOqiDYyGY8DCr 1hUIteM5JTrb5IhXnSEFgchVCjUzAWXszfBL184p4WS4vFziQY7YHqf/0Meuzzr7H0e/ uv9b0b7g9r1XfHP53HdQstJYr8KbwhHSgCeJ9JWH421pcwrMGJz/Ttom6mCkXmI3zGef J1UDQ7RmTymeUa97HiEXkPGHAdQw+XUYyL0oFrDn3m0ZKhZKAt8KrMzbeVGyJ1jvUt+0 0x6A== X-Gm-Message-State: AElRT7FiqZ+Q18zBeVJWig0od42azg/5Koa0kKt6k37YlqKFSsBxKOTm Bb+sIL5olU3xlp4O99jhHcw9U66Re7cgSJbEqOghGg== X-Google-Smtp-Source: AG47ELuEXt13+/JerocW9SbNuVe+777/XBsacci7QPgqOks4RJs+RZeKllyHs7/y/QydW24olcPqyJbIwX9leEPcWxs= X-Received: by 10.98.6.133 with SMTP id 127mr1270839pfg.28.1521199039683; Fri, 16 Mar 2018 04:17:19 -0700 (PDT) MIME-Version: 1.0 Received: by 10.100.144.66 with HTTP; Fri, 16 Mar 2018 04:17:18 -0700 (PDT) In-Reply-To: References: From: Chamath Abeysinghe Date: Fri, 16 Mar 2018 16:47:18 +0530 Message-ID: Subject: Re: Sub projects in Language and run time for parameter servers [SYSTEMML-2083] To: Matthias Boehm , dev@systemml.apache.org Content-Type: multipart/alternative; boundary="94eb2c093dec64a162056785c022" --94eb2c093dec64a162056785c022 Content-Type: text/plain; charset="UTF-8" Hi Matthias, After going through JIRA sub projects and references you provide I thought of drafting proposal focusing the Distributed spark backend project because it seems challenging and exciting area to explore :-). I have sketched a rough diagram for design and the implementation plan for the proposal, https://drive.google.com/file/d/1MTlYWvkkApe28vDOodDR8hmxzVx9QwQX/view?usp=sharing My idea is making Paramserv runtime similar design to ParFor runtime, and as a extension it will handle parameter exchange. So there I will work on some primitives required by runtime to manage the PS and then in Spark I will implement a parameter server. Initially it will work using synchronous method and then if time allows I will experiment with other methods and performance factors. And also regarding the control program I have some concerns, In the project JIRA it was mentioned that "PS strategies will be selected by the user", does this include the architecture of the parameter server(# of workers and servers) also or does it need to be handled in the project? I hope this plan aligns with expectations of the community and does not conflicts with other GSoC candidates. Your feedback for this highly appreciated, if there is anything wrong please correct me. Thanks *PS : I am re sending the same mail because it seems previous mail with attachment was not delivered to the dev mailing list. * Regards, Chamath On Fri, Mar 9, 2018 at 2:19 PM, Matthias Boehm wrote: > Hi Chamath, > > ad 1: Yes, this is absolutely correct. However, it is important to realize > that within the workers, we want to run dml functions, and for these we'll > reuse our existing compiler, runtime, operations, and data structures. > > ad 2: Yes, this is also correct. Indeed we can use an existing parfor > (with local execution mode) to emulate a local, synchronous parameter > server. However, it would be very hard - and conflicting with our > functional and thus, stateless execution semantics - to incorporate > asynchronous updates and strategies such as Hogwild!. Furthermore, such a > local parameter server might also have an application with very large > models and batches, because this would enable distributed data-parallel > operations spawn from each local worker. > > ad 3: Unfortunately, there is no one single detailed architecture diagram > because the system evolves over time. I would recommend to look at the > following two papers, where especially [1] (the parfor paper, and its > extensions for Spark in [2]) might give you a better idea of the parameter > server and its workers, which are primarily meant to handle the > orchestration and efficient parameter updates/exchange. if you're looking > for coarse-grained component, then [3], slide 8 might be a starting point. > At a high-level each operation and some constructs like parfor have > physical operators for CP, SPARK, MR, and some for GPU. Similarly this > project aims to introduce a new paramserv builtin function (most similar to > parfor) and its different physical operators. > > ad 4: Since this paramserv function has similarity with parfor, we will be > able to reuse key primitives for bringing up local/remote workers, shipping > the compiled functions, and input data. The major extensions will be to > call the shipped functions per batch, get the returned (i.e., updated) > parameters and handle the exchange accordingly to the paramserv > configuration. However, since paramserv as an operation is implemented from > scratch, we can customize as needed and are not restricted by script-level > semantics which renders the problem simpler as the general-purpose parfor > construct. Both have their use cases. > > In case this did not clarify your questions, let us known and we'll sort > it out. > > [1] http://www.vldb.org/pvldb/vol7/p553-boehm.pdf, 2014 > [2] http://www.vldb.org/pvldb/vol9/p1425-boehm.pdf, 2016 > [3] http://boss.dima.tu-berlin.de/media/BOSS16-Tutorial-mboehm.pdf, 2016 > > Regards, > Matthias > > On Thu, Mar 8, 2018 at 10:28 PM, Chamath Abeysinghe < > abeysinghechamath@gmail.com> wrote: > >> Hi, >> I am trying to understand the purpose and work needed for different sub >> projects in SYSTEMML-2083. And I got few questions, >> >> * In the JIRA it was mentioned that we are not integrating off the shelf >> Parameter Server, but rather develop language and run time support from >> scratch. As far as I understand, this means creating syntax for DML to >> interact with the parameter server. And the parameter server implementation >> is in different back-ends. So for example in Spark back end we have to >> create a some kind of parameter server implementation with different >> strategies, and it should be invoked by the syntax in DML. Is this >> understanding correct? >> >> * In the JIRA there is a sub project for local multi threaded back-end. >> In this project does "local" mean executing on single node similar to >> ExecType.CP? If it is the case why use a parameter server for a single >> node? >> >> * I was unable to find a architecture diagram for SystemML, is there any >> that kind of diagram to understand the interaction between different >> back-ends and language API or can you point me to those classes? >> >> * And those new run times, are they going to be completely new separate >> run times or improvements to the existing ones? >> >> Please help me understand these issues. Thanks in advance. >> >> Regards, >> Chamath >> >> > -- Chamath Abeysinghe Department of Computer Science and Engineering University of Moratuwa Mobile: +94712803295 --94eb2c093dec64a162056785c022--