systemml-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chamath Abeysinghe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SYSTEMML-2083) Language and runtime for parameter servers
Date Tue, 13 Feb 2018 12:00:00 GMT

    [ https://issues.apache.org/jira/browse/SYSTEMML-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16362193#comment-16362193
] 

Chamath Abeysinghe commented on SYSTEMML-2083:
----------------------------------------------

Hi,

I am Chamath Abeysinghe, a final year undergraduate from Computer Science and Engineering
Department, University of Moratuwa. I am looking forward to participate in GSoC 2018, and
this project looks an exciting one for me. 

I have sound knowledge in machine learning technologies and would like to see how they are
used in a distributed environment. 

As the first step to get familiarized, I will go through the SystemML documentation, given
research paper and will try to build and play with the code. Hope this is the right track.
Any guidance would be very helpful. Thanks.

> Language and runtime for parameter servers
> ------------------------------------------
>
>                 Key: SYSTEMML-2083
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-2083
>             Project: SystemML
>          Issue Type: Epic
>            Reporter: Matthias Boehm
>            Priority: Major
>              Labels: gsoc2018
>
> SystemML already provides a rich set of execution strategies ranging from local operations
to large-scale computation on MapReduce or Spark. In this context, we support both data-parallel
(multi-threaded or distributed operations) as well as task-parallel computation (multi-threaded
or distributed parfor loops). This epic aims to complement the existing execution strategies
by language and runtime primitives for parameter servers, i.e., model-parallel execution.
We use the terminology of model-parallel execution with distributed data and distributed model
to differentiate them from the existing data-parallel operations. Target applications are
distributed deep learning and mini-batch algorithms in general. These new abstractions will
help making SystemML a unified framework for small- and large-scale machine learning that
supports all three major execution strategies in a single framework.
>  
> A major challenge is the integration of stateful parameter servers and their common push/pull
primitives into an otherwise functional (and thus, stateless) language. We will approach this
challenge via a new builtin function \{{paramserv}} which internally maintains state but at
the same time fits into the runtime framework of stateless operations.
> Furthermore, we are interested in providing (1) different runtime backends (local and
distributed), (2) different parameter server modes (synchronous, asynchronous, hogwild!, stale-synchronous),
(3) different update frequencies (batch, multi-batch, epoch), as well as (4) different architectures
for distributed data (1 parameter server, k workers) and distributed model (k1 parameter servers,
k2 workers). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message