systemml-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "LI Guobao (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SYSTEMML-2087) Initial version of distributed spark backend
Date Sun, 24 Jun 2018 18:49:00 GMT

     [ https://issues.apache.org/jira/browse/SYSTEMML-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

LI Guobao updated SYSTEMML-2087:
--------------------------------
    Description: This part aims to implement the parameter server for spark distributed backend.
In general, the implementation of ps is very close to local ps. The ps provides the pull/push
service to the workers in driver node whereas the communication between ps and workers will
be done vias RPC. And then the data needs to be distributed to the workers according to the
different data partition schemes. The worker setup and cleanup is different from the local
one which needs to be handled.   (was: This part aims to implement the parameter server for
spark distributed backend. In general, we could launch a parameter server in a host to provide
the pull and push service. For the moment, all the weights and biases are saved in a hashmap
using a key, e.g., "global parameter". Each worker's gradients will be put into the hashmap
seperately with a given key. And the exchange between server and workers will be implemented
by netty RPC. Hence, we could easily broadcast the IP address and the port number to the workers.
And then the workers can send the gradients and retrieve the new parameters via netty RPC.
The server will also spawn a thread which retrieves the gradients by polling the hashmap
using relevant keys and aggregates them. At last, it updates the global parameter in the hashmap.)

> Initial version of distributed spark backend
> --------------------------------------------
>
>                 Key: SYSTEMML-2087
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-2087
>             Project: SystemML
>          Issue Type: Sub-task
>            Reporter: Matthias Boehm
>            Assignee: LI Guobao
>            Priority: Major
>
> This part aims to implement the parameter server for spark distributed backend. In general,
the implementation of ps is very close to local ps. The ps provides the pull/push service
to the workers in driver node whereas the communication between ps and workers will be done
vias RPC. And then the data needs to be distributed to the workers according to the different
data partition schemes. The worker setup and cleanup is different from the local one which
needs to be handled. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message