systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthias Boehm <mboe...@googlemail.com>
Subject Re: On the need for Parameter Server. ( A Model Parallel Construct )
Date Mon, 19 Jun 2017 06:03:48 GMT
Well, at a high-level, we could emulate synchronous model parallelism via
our existing parfor construct out of the box. If this is sufficient from an
algorithm perspective, I would be in favor of making any necessary
improvements there instead of introducing a new construct for parameter
servers.

There are a couple of reasons for that. First, given the variety of
backends and potential execution plans, it's usually hard work to integrate
such a construct well with the rest of the system. Second, a custom
parameter server would need to be either integrated with Spark, or (if
implemented from scratch) with a number of different cluster resource
managers (e.g., YARN, Mesos, Kubernetes, etc). Third, extending the
existing parfor construct as necessary would potentially also benefit other
scripts.

Asynchronous model parallelism might also be possible to integrate into
parfor. I remember discussions on state exchange between parfor workers
(e.g., for KMeans to find out if at least one run converged already). Maybe
this is a good time to introduce this, which would allow the update and
broadcast of models in this context.

Regards,
Matthias

On Sun, Jun 18, 2017 at 10:16 PM, Janardhan Pulivarthi <
janardhan.pulivarthi@gmail.com> wrote:

> Dear committers,
>
> Implementation/Integration of existing parameter server for the execution
> of algorithms in a distributed fashion both for the machine learning and
> deep learning.
>
> The following document covers a bit about whether we need one or not ?.
>
> My name is Janardhan, currently working on [SYSTEMML-1437] implementation
> of factorization machines, which are to be sparse-safe and scalable, to
> stick to this philosophy we might need a model parallel construct. I know
> very little about how systemml exactly works. If you find some *7 minutes*
> please have a look at this doc.
> ​​​
>  Parameter Server: a model parallel construct
> <https://docs.google.com/document/d/1AOW53numMJSF_msGvo1lekpyv7_
> 3VF51i6xAjNCEC9I/edit?usp=drive_web>
> ​
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message