systemml-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "LI Guobao (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SYSTEMML-2087) Initial version of distributed spark backend
Date Sun, 13 May 2018 15:53:00 GMT

     [ https://issues.apache.org/jira/browse/SYSTEMML-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

LI Guobao updated SYSTEMML-2087:
--------------------------------
    Description: This part aims to implement the parameter server for spark distributed backend.
In general, we could launch a parameter server in a host to provide the pull and push service.
For the moment, all the weights and biases are saved in a hashmap using a key, e.g., "global
parameter". Each worker's gradients will be put into the hashmap seperately with a given
key. And the exchange between server and workers will be implemented by netty RPC. Hence,
we could easily broadcast the IP address and the port number to the workers. And then the
workers can send the gradients and retrieve the new parameters via TCP socket. The server
will also spawn a thread which retrieves the gradients by polling the hashmap using relevant
keys and aggregates them. At last, it updates the global parameter in the hashmap.  (was:
This part aims to implement the BSP for spark distributed backend. Hence the idea is to be
able to launch a remote parameter server and the workers.)

> Initial version of distributed spark backend
> --------------------------------------------
>
>                 Key: SYSTEMML-2087
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-2087
>             Project: SystemML
>          Issue Type: Sub-task
>            Reporter: Matthias Boehm
>            Assignee: LI Guobao
>            Priority: Major
>
> This part aims to implement the parameter server for spark distributed backend. In general,
we could launch a parameter server in a host to provide the pull and push service. For the
moment, all the weights and biases are saved in a hashmap using a key, e.g., "global parameter". Each
worker's gradients will be put into the hashmap seperately with a given key. And the exchange
between server and workers will be implemented by netty RPC. Hence, we could easily broadcast
the IP address and the port number to the workers. And then the workers can send the gradients
and retrieve the new parameters via TCP socket. The server will also spawn a thread which
retrieves the gradients by polling the hashmap using relevant keys and aggregates them. At
last, it updates the global parameter in the hashmap.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message