systemml-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias Boehm (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SYSTEMML-2087) Initial version of distributed spark backend
Date Sun, 13 May 2018 01:00:00 GMT

    [ https://issues.apache.org/jira/browse/SYSTEMML-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16473322#comment-16473322
] 

Matthias Boehm commented on SYSTEMML-2087:
------------------------------------------

Once we come closer to this task, it would be good to flash out the details in terms of sub
tasks. For example, we need to decide (1) how to distribute the data (for the different distribution
schemes) to the individual workers, (2) how to implement the parameter exchange, and (3) how
to handle task failures and preemption. Regarding the latter, I would recommend to start simple
with something like once a worker is brought up it pulls the current state of the model and
checkpointing is done in a centralized manner.

> Initial version of distributed spark backend
> --------------------------------------------
>
>                 Key: SYSTEMML-2087
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-2087
>             Project: SystemML
>          Issue Type: Sub-task
>            Reporter: Matthias Boehm
>            Assignee: LI Guobao
>            Priority: Major
>
> This part aims to implement the BSP for spark distributed backend. Hence the idea is
to be able to launch a remote parameter server and the workers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message