From dev-return-2360-archive-asf-public=cust-asf.ponee.io@systemml.apache.org Fri Mar 9 09:49:07 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id B802F18064A for ; Fri, 9 Mar 2018 09:49:06 +0100 (CET) Received: (qmail 94898 invoked by uid 500); 9 Mar 2018 08:49:05 -0000 Mailing-List: contact dev-help@systemml.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@systemml.apache.org Delivered-To: mailing list dev@systemml.apache.org Received: (qmail 94873 invoked by uid 99); 9 Mar 2018 08:49:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Mar 2018 08:49:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 693F1C02F5 for ; Fri, 9 Mar 2018 08:49:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.129 X-Spam-Level: ** X-Spam-Status: No, score=2.129 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id sxFgJ9A3pe3L for ; Fri, 9 Mar 2018 08:49:03 +0000 (UTC) Received: from mail-ua0-f180.google.com (mail-ua0-f180.google.com [209.85.217.180]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id CBD2B5F56D for ; Fri, 9 Mar 2018 08:49:02 +0000 (UTC) Received: by mail-ua0-f180.google.com with SMTP id j15so1748406uan.3 for ; Fri, 09 Mar 2018 00:49:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to:cc; bh=0Z/ZtvIQAtsaqqL2hW5EZXb6Pm60G+fNzoJFkqe6ons=; b=d+Liz+zPWVuWJwdaNlWijc9wukZzdPoJWJrvWj4CA49OiyjOXRK7Z8P2m0RcTahcHW XkZCqjH20tuH3GEoa5NFwXVh7FVGDoKk3KPSALmsojM71jtLfWiOux/mtocJTa9C8Jvm BgJ+M/pRyh3pNgcsWEcFANx56H3yp7EBVRP+h+dI+TYhThY3QyGkqIHtsDimadqREZnt YIW6SHwfIoHS/W3HEQK24X1dVhG6xADWPgpFbVv8hclujOS63p3l6ACN6ju5u9uAMMpb XsiY4z5me3rhi2oywMR77ja4NyiN/cpMDBhNXsnH/VG01Z0ixWACgzaJPG8niD/avKOL u1UQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to:cc; bh=0Z/ZtvIQAtsaqqL2hW5EZXb6Pm60G+fNzoJFkqe6ons=; b=nY/R44D1SS06a1bFfTAaLnmkaMl3/3lcm2SxrPCE01FSfYeHtFYNqr72jvX796iBYq rdsnfGYyDUyDEHJ9lrar+7uzYwp4c3kcoGsOKDYXmpzSjX/HxsABNwoswjG5a00MC9QS p39QUx9odnWKYj4EQnarolc5kiLVtLoVX1rnQl5jZvrcVJajxk5MGjT1GIDf/ZOzPBp/ FiG81QBW6qoAtzDzoNXY8AceD2W7DxMDG1LGQ7UNfWcOH8n3OoHIp/gJ2ZsoQR1IhThl Cl0XLVCm+mI4yH0W02BSvRUWezaLEryhNT1FvW3ZJQTWntoAxBNRTMznCwPRLb4eypHx 88hg== X-Gm-Message-State: AElRT7EgnL7E8eeKkpHxgtAlXbCeIDESolV9bPRj3bC0DwjgsYNj2iqY Ns/q7qYvxw9ollAXrwVCRI5D+UymzA3L4s5QnQ== X-Google-Smtp-Source: AG47ELspX9Ix1VbaczC4R7Ls7Sbj3fYyH70dfr3RIOk1ol1drccBBgUHqwJWCIBzpZtElbz77d3X0BLSfqhpzCef2jE= X-Received: by 10.176.72.18 with SMTP id b18mr21379472uad.111.1520585342346; Fri, 09 Mar 2018 00:49:02 -0800 (PST) MIME-Version: 1.0 Received: by 10.176.0.181 with HTTP; Fri, 9 Mar 2018 00:49:01 -0800 (PST) From: Matthias Boehm Date: Fri, 9 Mar 2018 00:49:01 -0800 Message-ID: Subject: Re: Sub projects in Language and run time for parameter servers [SYSTEMML-2083] To: Chamath Abeysinghe Cc: dev@systemml.apache.org Content-Type: multipart/alternative; boundary="001a1145ab942e6b600566f6dd8a" --001a1145ab942e6b600566f6dd8a Content-Type: text/plain; charset="UTF-8" Hi Chamath, ad 1: Yes, this is absolutely correct. However, it is important to realize that within the workers, we want to run dml functions, and for these we'll reuse our existing compiler, runtime, operations, and data structures. ad 2: Yes, this is also correct. Indeed we can use an existing parfor (with local execution mode) to emulate a local, synchronous parameter server. However, it would be very hard - and conflicting with our functional and thus, stateless execution semantics - to incorporate asynchronous updates and strategies such as Hogwild!. Furthermore, such a local parameter server might also have an application with very large models and batches, because this would enable distributed data-parallel operations spawn from each local worker. ad 3: Unfortunately, there is no one single detailed architecture diagram because the system evolves over time. I would recommend to look at the following two papers, where especially [1] (the parfor paper, and its extensions for Spark in [2]) might give you a better idea of the parameter server and its workers, which are primarily meant to handle the orchestration and efficient parameter updates/exchange. if you're looking for coarse-grained component, then [3], slide 8 might be a starting point. At a high-level each operation and some constructs like parfor have physical operators for CP, SPARK, MR, and some for GPU. Similarly this project aims to introduce a new paramserv builtin function (most similar to parfor) and its different physical operators. ad 4: Since this paramserv function has similarity with parfor, we will be able to reuse key primitives for bringing up local/remote workers, shipping the compiled functions, and input data. The major extensions will be to call the shipped functions per batch, get the returned (i.e., updated) parameters and handle the exchange accordingly to the paramserv configuration. However, since paramserv as an operation is implemented from scratch, we can customize as needed and are not restricted by script-level semantics which renders the problem simpler as the general-purpose parfor construct. Both have their use cases. In case this did not clarify your questions, let us known and we'll sort it out. [1] http://www.vldb.org/pvldb/vol7/p553-boehm.pdf, 2014 [2] http://www.vldb.org/pvldb/vol9/p1425-boehm.pdf, 2016 [3] http://boss.dima.tu-berlin.de/media/BOSS16-Tutorial-mboehm.pdf, 2016 Regards, Matthias On Thu, Mar 8, 2018 at 10:28 PM, Chamath Abeysinghe < abeysinghechamath@gmail.com> wrote: > Hi, > I am trying to understand the purpose and work needed for different sub > projects in SYSTEMML-2083. And I got few questions, > > * In the JIRA it was mentioned that we are not integrating off the shelf > Parameter Server, but rather develop language and run time support from > scratch. As far as I understand, this means creating syntax for DML to > interact with the parameter server. And the parameter server implementation > is in different back-ends. So for example in Spark back end we have to > create a some kind of parameter server implementation with different > strategies, and it should be invoked by the syntax in DML. Is this > understanding correct? > > * In the JIRA there is a sub project for local multi threaded back-end. In > this project does "local" mean executing on single node similar to > ExecType.CP? If it is the case why use a parameter server for a single > node? > > * I was unable to find a architecture diagram for SystemML, is there any > that kind of diagram to understand the interaction between different > back-ends and language API or can you point me to those classes? > > * And those new run times, are they going to be completely new separate > run times or improvements to the existing ones? > > Please help me understand these issues. Thanks in advance. > > Regards, > Chamath > > --001a1145ab942e6b600566f6dd8a--