systemml-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias Boehm (JIRA)" <>
Subject [jira] [Commented] (SYSTEMML-2083) Language and runtime for parameter servers
Date Thu, 22 Feb 2018 07:03:00 GMT


Matthias Boehm commented on SYSTEMML-2083:

great to hear that [~Guobao] - I would recommend to subscribe to our mailing list but you
can also view the achieve (see for the links). I will
post some organizational comments there shortly to bring everybody on the same page. Subsequently,
we can enter more detailed discussions regarding the topic.

Just to clarify one aspect to ensure the topic is not misunderstood. This epic is NOT about
integrating an "off-the-shelf parameter server" but to build language and runtime support
from scratch in SystemML. However, SystemML already supports general linear algebra programs
as well as deep-learning primitives, which we will completely reuse. So major components to
develop here are nice API extensions (builtin function, Keras2DML extension), as well as local
and distributed runtime support. The runtime support primarily involves data and program shipping
(where we will reuse many primitives from the existing runtime of parfor loops), efficient
parameter exchange, and a modular design for different update strategies. 

> Language and runtime for parameter servers
> ------------------------------------------
>                 Key: SYSTEMML-2083
>                 URL:
>             Project: SystemML
>          Issue Type: Epic
>            Reporter: Matthias Boehm
>            Priority: Major
>              Labels: gsoc2018
>         Attachments: image-2018-02-14-12-18-48-932.png, image-2018-02-14-12-21-00-932.png,
> SystemML already provides a rich set of execution strategies ranging from local operations
to large-scale computation on MapReduce or Spark. In this context, we support both data-parallel
(multi-threaded or distributed operations) as well as task-parallel computation (multi-threaded
or distributed parfor loops). This epic aims to complement the existing execution strategies
by language and runtime primitives for parameter servers, i.e., model-parallel execution.
We use the terminology of model-parallel execution with distributed data and distributed model
to differentiate them from the existing data-parallel operations. Target applications are
distributed deep learning and mini-batch algorithms in general. These new abstractions will
help making SystemML a unified framework for small- and large-scale machine learning that
supports all three major execution strategies in a single framework.
> A major challenge is the integration of stateful parameter servers and their common push/pull
primitives into an otherwise functional (and thus, stateless) language. We will approach this
challenge via a new builtin function {{paramserv}} which internally maintains state but at
the same time fits into the runtime framework of stateless operations.
> Furthermore, we are interested in providing (1) different runtime backends (local and
distributed), (2) different parameter server modes (synchronous, asynchronous, hogwild!, stale-synchronous),
(3) different update frequencies (batch, multi-batch, epoch), as well as (4) different architectures
for distributed data (1 parameter server, k workers) and distributed model (k1 parameter servers,
k2 workers). 
> *Note for GSOC students:* This is large project which will be broken down into sub projects,
so everybody will be having their share of pie.
> *Prerequistes:* Java, machine learning experience is a plus but not required.

This message was sent by Atlassian JIRA

View raw message