Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 8C711200CBA for ; Mon, 19 Jun 2017 08:03:58 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 8B3B4160BEE; Mon, 19 Jun 2017 06:03:58 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id CFDDA160BE3 for ; Mon, 19 Jun 2017 08:03:57 +0200 (CEST) Received: (qmail 18276 invoked by uid 500); 19 Jun 2017 06:03:57 -0000 Mailing-List: contact dev-help@systemml.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@systemml.apache.org Delivered-To: mailing list dev@systemml.apache.org Received: (qmail 18264 invoked by uid 99); 19 Jun 2017 06:03:56 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Jun 2017 06:03:56 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 1D0BBCA7C5 for ; Mon, 19 Jun 2017 06:03:56 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.42 X-Spam-Level: * X-Spam-Status: No, score=1.42 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.729, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=googlemail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id b0_HNRpPJLNo for ; Mon, 19 Jun 2017 06:03:55 +0000 (UTC) Received: from mail-ua0-f171.google.com (mail-ua0-f171.google.com [209.85.217.171]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id F10A55FAC5 for ; Mon, 19 Jun 2017 06:03:54 +0000 (UTC) Received: by mail-ua0-f171.google.com with SMTP id 68so52742943uas.0 for ; Sun, 18 Jun 2017 23:03:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=9/G9gIVcPC2QZbTvAyIHK61+/6cqwzTm933QzvIoXms=; b=RdeuwKs+Jv7a+JbOlV8qKubU7tjjvE3bQ9/kYpIluecFLeBh/82ChyO+njXUp80sx0 g52JmwAtqS79wI5UKlVCtPYMPjvfae69f7TQQkvEX5UZRtF6f/HPrBLKk+FhNM7VlE6q JlFnQAXuWm36dKMi7dJHe4vmfIHy+eamrGNyAZaMfitTKsaeKtmlAQjrP4piVxbDQxuP RU5XYu29ocOQ0CEJQali4s0nn1iK9PX0b04HChldwnjHXL9Dbi47+A7SDdokfujAPNUW GWlwaJES6EODRDTgECljAJ6eACyrDyUeCWGsaQP1X/px3u+kScJ3D5UvZx19Xmpq3zeX LYFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=9/G9gIVcPC2QZbTvAyIHK61+/6cqwzTm933QzvIoXms=; b=K10oU2G9W81+eLOw5ILG/v7GKExW8+IOLqSeTAjsQ7i2QAetoLpbGbN4n8YmApUKx8 sOR55ZKPKeGnIpTe/WEkX+3G0fzIwhHxwL3Ujzv4bp7k87xPCfR0KkkjN/wtOgzSevQS cYfnvF6qdhVWS9xs/IYoa/22SqyOsHHyPSZ2ps2eD+ADmXqp6Sp4CiN47u1JzNihCMrX U/SoRQMhyHDbe4QldHtRW91YHVj5AFeLyrr2uAGW1mvh8CoDfvitNY5op4xcP6BhvbI/ eqmm3zzNw8t84xoRZX/p2UGRu3QBuu9039greCvnxBcqSinEyJTZDSrg0cCzdNN/cMxB TFZw== X-Gm-Message-State: AKS2vOyHiGIgHrEqyTB1iQHqgmTZrW48I4XE0s1Z/Gg/8lTwiq3WS9fG OJDotW/htYpzXk/QqHb1ipeHVBV9AA== X-Received: by 10.176.4.194 with SMTP id 60mr5691318uaw.141.1497852228475; Sun, 18 Jun 2017 23:03:48 -0700 (PDT) MIME-Version: 1.0 Received: by 10.159.32.166 with HTTP; Sun, 18 Jun 2017 23:03:48 -0700 (PDT) From: Matthias Boehm Date: Sun, 18 Jun 2017 23:03:48 -0700 Message-ID: Subject: Re: On the need for Parameter Server. ( A Model Parallel Construct ) To: dev@systemml.apache.org Content-Type: multipart/alternative; boundary="94eb2c12526c012409055249e628" archived-at: Mon, 19 Jun 2017 06:03:58 -0000 --94eb2c12526c012409055249e628 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Well, at a high-level, we could emulate synchronous model parallelism via our existing parfor construct out of the box. If this is sufficient from an algorithm perspective, I would be in favor of making any necessary improvements there instead of introducing a new construct for parameter servers. There are a couple of reasons for that. First, given the variety of backends and potential execution plans, it's usually hard work to integrate such a construct well with the rest of the system. Second, a custom parameter server would need to be either integrated with Spark, or (if implemented from scratch) with a number of different cluster resource managers (e.g., YARN, Mesos, Kubernetes, etc). Third, extending the existing parfor construct as necessary would potentially also benefit other scripts. Asynchronous model parallelism might also be possible to integrate into parfor. I remember discussions on state exchange between parfor workers (e.g., for KMeans to find out if at least one run converged already). Maybe this is a good time to introduce this, which would allow the update and broadcast of models in this context. Regards, Matthias On Sun, Jun 18, 2017 at 10:16 PM, Janardhan Pulivarthi < janardhan.pulivarthi@gmail.com> wrote: > Dear committers, > > Implementation/Integration of existing parameter server for the execution > of algorithms in a distributed fashion both for the machine learning and > deep learning. > > The following document covers a bit about whether we need one or not ?. > > My name is Janardhan, currently working on [SYSTEMML-1437] implementation > of factorization machines, which are to be sparse-safe and scalable, to > stick to this philosophy we might need a model parallel construct. I know > very little about how systemml exactly works. If you find some *7 minutes= * > please have a look at this doc. > =E2=80=8B=E2=80=8B=E2=80=8B > Parameter Server: a model parallel construct > 3VF51i6xAjNCEC9I/edit?usp=3Ddrive_web> > =E2=80=8B > --94eb2c12526c012409055249e628--